aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-06-22hw/arm/virt: Add a new 256MB ECAM regionEric Auger
This patch defines a new ECAM region located after the 256GB limit. The virt machine state is augmented with a new highmem_ecam field which guards the usage of this new ECAM region instead of the legacy 16MB one. With the highmem ECAM region, up to 256 PCIe buses can be used. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-id: 1529072910-16156-9-git-send-email-eric.auger@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-22hw/arm/virt: GICv3 DT node with one or two redistributor regionsEric Auger
This patch allows the creation of a GICv3 node with 1 or 2 redistributor regions depending on the number of smu_cpus. The second redistributor region is located just after the existing RAM region, at 256GB and contains up to up to 512 vcpus. Please refer to kernel documentation for further node details: Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-id: 1529072910-16156-6-git-send-email-eric.auger@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-22hw/intc/arm_gicv3: Introduce redist-region-count array propertyEric Auger
To prepare for multiple redistributor regions, we introduce an array of uint32_t properties that stores the redistributor count of each redistributor region. Non accelerated VGICv3 only supports a single redistributor region. The capacity of all redist regions is checked against the number of vcpus. Machvirt is updated to set those properties, ie. a single redistributor region with count set to the number of vcpus capped by 123. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-id: 1529072910-16156-4-git-send-email-eric.auger@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-22linux-headers: Update to kernel mainline commit b357bf602Eric Auger
Update our kernel headers to mainline commit b357bf6023a948cf6a9472f07a1b0caac0e4f8e8 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm") Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-id: 1529072910-16156-2-git-send-email-eric.auger@redhat.com [PMM: clarified commit message] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-22Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-06-20-v2' ↵Peter Maydell
into staging nbd patches for 2018-06-20 Add experimental x-nbd-server-add-bitmap to expose a disabled bitmap over NBD, in preparation for a pull model incremental backup scheme. Also fix a corner case protocol issue with NBD_CMD_BLOCK_STATUS, and add new NBD_CMD_CACHE. - Eric Blake: tests: Simplify .gitignore - Eric Blake: nbd/server: Reject 0-length block status request - Vladimir Sementsov-Ogievskiy: 0/6 NBD export bitmaps - Vladimir Sementsov-Ogievskiy: nbd/server: introduce NBD_CMD_CACHE # gpg: Signature made Thu 21 Jun 2018 15:53:55 BST # gpg: using RSA key A7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2018-06-20-v2: nbd/server: introduce NBD_CMD_CACHE docs/interop: add nbd.txt qapi: new qmp command nbd-server-add-bitmap nbd/server: implement dirty bitmap export nbd/server: add nbd_meta_empty_or_pattern helper nbd/server: refactor NBDExportMetaContexts nbd/server: fix trace nbd/server: Reject 0-length block status request tests: Simplify .gitignore Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-22spapr: Use maximum page size capability to simplify memory backend checkingDavid Gibson
The way we used to handle KVM allowable guest pagesizes for PAPR guests required some convoluted checking of memory attached to the guest. The allowable pagesizes advertised to the guest cpus depended on the memory which was attached at boot, but then we needed to ensure that any memory later hotplugged didn't change which pagesizes were allowed. Now that we have an explicit machine option to control the allowable maximum pagesize we can simplify this. We just check all memory backends against that declared pagesize. We check base and cold-plugged memory at reset time, and hotplugged memory at pre_plug() time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-22spapr: Maximum (HPT) pagesize propertyDavid Gibson
The way the POWER Hash Page Table (HPT) MMU is virtualized by KVM HV means that every page that the guest puts in the pagetables must be truly physically contiguous, not just GPA-contiguous. In effect this means that an HPT guest can't use any pagesizes greater than the host page size used to back its memory. At present we handle this by changing what we advertise to the guest based on the backing pagesizes. This is pretty bad, because it means the guest sees a different environment depending on what should be host configuration details. As a start on fixing this, we add a new capability parameter to the pseries machine type which gives the maximum allowed pagesizes for an HPT guest. For now we just create and validate the parameter without making it do anything. For backwards compatibility, on older machine types we set it to the max available page size for the host. For the 3.0 machine type, we fix it to 16, the intention being to only allow HPT pagesizes up to 64kiB by default in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-21Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180615' into stagingPeter Maydell
TCG patch queue: Workaround macos assembler lossage. Eliminate tb_lock. Fix TB code generation overflow. # gpg: Signature made Fri 15 Jun 2018 20:40:56 BST # gpg: using RSA key 64DF38E8AF7E215F # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-tcg-20180615: tcg: Reduce max TB opcode count tcg: remove tb_lock translate-all: remove tb_lock mention from cpu_restore_state_from_tb cputlb: remove tb_lock from tlb_flush functions translate-all: protect TB jumps with a per-destination-TB lock translate-all: discard TB when tb_link_page returns an existing matching TB translate-all: introduce assert_no_pages_locked translate-all: add page_locked assertions translate-all: use per-page locking in !user-mode translate-all: move tb_invalidate_phys_page_range up in the file translate-all: work page-by-page in tb_invalidate_phys_range_1 translate-all: remove hole in PageDesc translate-all: make l1_map lockless translate-all: iterate over TBs in a page with PAGE_FOR_EACH_TB tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx tcg: track TBs with per-region BST's qht: return existing entry when qht_insert fails qht: require a default comparison function tcg/i386: Use byte form of xgetbv instruction Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-21nbd/server: introduce NBD_CMD_CACHEVladimir Sementsov-Ogievskiy
Handle nbd CACHE command. Just do read, without sending read data back. Cache mechanism should be done by exported node driver chain. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180413143156.11409-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix two missing case labels in switch statements] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-06-21nbd/server: implement dirty bitmap exportVladimir Sementsov-Ogievskiy
Handle a new NBD meta namespace: "qemu", and corresponding queries: "qemu:dirty-bitmap:<export bitmap name>". With the new metadata context negotiated, BLOCK_STATUS query will reply with dirty-bitmap data, converted to extents. The new public function nbd_export_bitmap selects which bitmap to export. For now, only one bitmap may be exported. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180609151758.17343-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: wording tweaks, minor cleanups, additional tracing] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-06-21ppc4xx_i2c: Implement directcntl registerBALATON Zoltan
As well as being able to generate its own i2c transactions, the ppc4xx i2c controller has a DIRECTCNTL register which allows explicit control of the i2c lines. Using this register an OS can directly bitbang i2c operations. In order to let emulated i2c devices respond to this, we need to wire up the DIRECTCNTL register to qemu's bitbanged i2c handling code. Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21ppc4xx_i2c: Remove unimplemented sdata and intr registersBALATON Zoltan
We don't emulate slave mode so related registers are not needed. [lh]sadr are only retained to avoid too many warnings and simplify debugging but sdata is not even correct because device has a 4 byte FIFO instead so just remove this unimplemented register for now. The intr register is also not implemented correctly, it is for diagnostics and normally not even visible on device without explicitly enabling it. As no guests are known to need this remove it as well. Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21spapr: remove unused spapr_irq routinesCédric Le Goater
spapr_irq_alloc_block and spapr_irq_alloc() are now deprecated. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21spapr: split the IRQ allocation sequenceCédric Le Goater
Today, when a device requests for IRQ number in a sPAPR machine, the spapr_irq_alloc() routine first scans the ICSState status array to find an empty slot and then performs the assignement of the selected numbers. Split this sequence in two distinct routines : spapr_irq_find() for lookups and spapr_irq_claim() for claiming the IRQ numbers. This will ease the introduction of a static layout of IRQ numbers. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21spapr: Add cpu_apply hook to capabilitiesDavid Gibson
spapr capabilities have an apply hook to actually activate (or deactivate) the feature in the system at reset time. However, a number of capabilities affect the setup of cpus, and need to be applied to each of them - including hotplugged cpus for extra complication. To make this simpler, add an optional cpu_apply hook that is called from spapr_cpu_reset(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>
2018-06-21spapr: Compute effective capability values earlierDavid Gibson
Previously, the effective values of the various spapr capability flags were only determined at machine reset time. That was a lazy way of making sure it was after cpu initialization so it could use the cpu object to inform the defaults. But we've now improved the compat checking code so that we don't need to instantiate the cpus to use it. That lets us move the resolution of the capability defaults much earlier. This is going to be necessary for some future capabilities. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>
2018-06-21ppc/pnv: introduce Pnv8Chip and Pnv9Chip modelsCédric Le Goater
It introduces a base PnvChip class from which the specific processor chip classes, Pnv8Chip and Pnv9Chip, inherit. Each of them needs to define an init and a realize routine which will create the controllers of the target processor. For the moment, the base PnvChip class handles the XSCOM bus and the cores. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21spapr_cpu_core: migrate per-CPU dataGreg Kurz
A per-CPU machine data pointer was recently added to PowerPCCPU. The motivation is to to hide platform specific details from the core CPU code. This per-CPU data can hold state which is relevant to the guest though, eg, Virtual Processor Areas, and we should migrate this state. This patch adds the plumbing so that we can migrate the per-CPU data for PAPR guests. We only do this for newer machine types for the sake of backward compatibility. No state is migrated for the moment: the vmstate_spapr_cpu_state structure will be populated by subsequent patches. Signed-off-by: Greg Kurz <groug@kaod.org> [dwg: Fix some trivial spelling and spacing errors] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21ppc/pnv: introduce a new isa_create() operation to the chip modelCédric Le Goater
This moves the details of the ISA bus creation under the LPC model but more important, the new PnvChip operation will let us choose the chip class to use when we introduce the different chip classes for Power9 and Power8. It hides away the processor chip controllers from the machine. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21ppc/pnv: introduce a new intc_create() operation to the chip modelCédric Le Goater
On Power9, the thread interrupt presenter has a different type and is linked to the chip owning the cores. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-19Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches: - Active mirror (blockdev-mirror copy-mode=write-blocking) - bdrv_drain_*() fixes and test cases - Fix crash with scsi-hd and drive_del # gpg: Signature made Mon 18 Jun 2018 17:44:10 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (35 commits) iotests: Add test for active mirroring block/mirror: Add copy mode QAPI interface block/mirror: Add active mirroring job: Add job_progress_increase_remaining() block/mirror: Add MirrorBDSOpaque block/dirty-bitmap: Add bdrv_dirty_iter_next_area test-hbitmap: Add non-advancing iter_next tests hbitmap: Add @advance param to hbitmap_iter_next() block: Generalize should_update_child() rule block/mirror: Use source as a BdrvChild block/mirror: Wait for in-flight op conflicts block/mirror: Use CoQueue to wait on in-flight ops block/mirror: Convert to coroutines block/mirror: Pull out mirror_perform() block: fix QEMU crash with scsi-hd and drive_del test-bdrv-drain: Test graph changes in drain_all section block: Allow graph changes in bdrv_drain_all_begin/end sections block: ignore_bds_parents parameter for drain functions block: Move bdrv_drain_all_begin() out of coroutine context block: Allow AIO_WAIT_WHILE with NULL ctx ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-19Merge remote-tracking branch 'remotes/kraxel/tags/vga-20180618-pull-request' ↵Peter Maydell
into staging vga: add ramfb, print virglrenderer version # gpg: Signature made Mon 18 Jun 2018 10:57:38 BST # gpg: using RSA key 4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/vga-20180618-pull-request: Add ramfb MAINTAINERS entry hw/display: add standalone ramfb device hw/display: add ramfb, a simple boot framebuffer living in guest ram configure: print virglrenderer version Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-18block/mirror: Add copy mode QAPI interfaceMax Reitz
This patch allows the user to specify whether to use active or only background mode for mirror block jobs. Currently, this setting will remain constant for the duration of the entire block job. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-14-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18job: Add job_progress_increase_remaining()Max Reitz
Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180613181823.13618-12-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18block/dirty-bitmap: Add bdrv_dirty_iter_next_areaMax Reitz
This new function allows to look for a consecutively dirty area in a dirty bitmap. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180613181823.13618-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18hbitmap: Add @advance param to hbitmap_iter_next()Max Reitz
This new parameter allows the caller to just query the next dirty position without moving the iterator. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180613181823.13618-8-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18block: Allow graph changes in bdrv_drain_all_begin/end sectionsKevin Wolf
bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and did a subtree drain for each of them. This works fine as long as the graph is static, but sadly, reality looks different. If the graph changes so that root nodes are added or removed, we would have to compensate for this. bdrv_next() returns each root node only once even if it's the root node for multiple BlockBackends or for a monitor-owned block driver tree, which would only complicate things. The much easier and more obviously correct way is to fundamentally change the way the functions work: Iterate over all BlockDriverStates, no matter who owns them, and drain them individually. Compensation is only necessary when a new BDS is created inside a drain_all section. Removal of a BDS doesn't require any action because it's gone afterwards anyway. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: ignore_bds_parents parameter for drain functionsKevin Wolf
In the future, bdrv_drained_all_begin/end() will drain all invidiual nodes separately rather than whole subtrees. This means that we don't want to propagate the drain to all parents any more: If the parent is a BDS, it will already be drained separately. Recursing to all parents is unnecessary work and would make it an O(n²) operation. Prepare the drain function for the changed drain_all by adding an ignore_bds_parents parameter to the internal implementation that prevents the propagation of the drain to BDS parents. We still (have to) propagate it to non-BDS parents like BlockBackends or Jobs because those are not drained separately. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Allow AIO_WAIT_WHILE with NULL ctxKevin Wolf
bdrv_drain_all() wants to have a single polling loop for draining the in-flight requests of all nodes. This means that the AIO_WAIT_WHILE() condition relies on activity in multiple AioContexts, which is polled from the mainloop context. We must therefore call AIO_WAIT_WHILE() from the mainloop thread and use the AioWait notification mechanism. Just randomly picking the AioContext of any non-mainloop thread would work, but instead of bothering to find such a context in the caller, we can just as well accept NULL for ctx. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Don't poll in parent drain callbacksKevin Wolf
bdrv_do_drained_begin() is only safe if we have a single BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow that parent callbacks introduce a nested polling loop that could cause graph changes while we're traversing the graph. Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single node without waiting for its requests to complete. These requests will be waited for in the BDRV_POLL_WHILE() call down the call chain. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Drain recursively with a single BDRV_POLL_WHILE()Kevin Wolf
Anything can happen inside BDRV_POLL_WHILE(), including graph changes that may interfere with its callers (e.g. child list iteration in recursive callers of bdrv_do_drained_begin). Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the end of bdrv_do_drained_begin() to avoid such effects. The recursion happens now inside the loop condition. As the graph can only change between bdrv_drain_poll() calls, but not inside of it, doing the recursion here is safe. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Really pause block jobs on drainKevin Wolf
We already requested that block jobs be paused in .bdrv_drained_begin, but no guarantee was made that the job was actually inactive at the point where bdrv_drained_begin() returned. This introduces a new callback BdrvChildRole.bdrv_drained_poll() and uses it to make bdrv_drain_poll() consider block jobs using the node to be drained. For the test case to work as expected, we have to switch from block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even considered active and must be waited for when draining the node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()Kevin Wolf
Commit 91af091f923 added an additional aio_poll() to BDRV_POLL_WHILE() in order to make sure that all pending BHs are executed on drain. This was the wrong place to make the fix, as it is useless overhead for all other users of the macro and unnecessarily complicates the mechanism. This patch effectively reverts said commit (the context has changed a bit and the code has moved to AIO_WAIT_WHILE()) and instead polls in the loop condition for drain. The effect is probably hard to measure in any real-world use case because actual I/O will dominate, but if I run only the initialisation part of 'qemu-img convert' where it calls bdrv_block_status() for the whole image to find out how much data there is copy, this phase actually needs only roughly half the time after this patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18hw/display: add standalone ramfb deviceGerd Hoffmann
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Message-id: 20180613122948.18149-3-kraxel@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2018-06-18hw/display: add ramfb, a simple boot framebuffer living in guest ramGerd Hoffmann
The boot framebuffer is expected to be configured by the firmware, so it uses fw_cfg as interface. Initialization goes as follows: (1) Check whenever etc/ramfb is present. (2) Allocate framebuffer from RAM. (3) Fill struct RAMFBCfg, write it to etc/ramfb. Done. You can write stuff to the framebuffer now, and it should appear automagically on the screen. Note that this isn't very efficient because it does a full display update on each refresh. No dirty tracking. Dirty tracking would have to be active for the whole ram slot, so that wouldn't be very efficient either. For a boot display which is active for a short time only this isn't a big deal. As permanent guest display something better should be used (if possible). This is the ramfb core code. Some windup is needed for display devices which want have a ramfb boot display. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Message-id: 20180613122948.18149-2-kraxel@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2018-06-16target/ppc, spapr: Move VPA information to machine_dataDavid Gibson
CPUPPCState currently contains a number of fields containing the state of the VPA. The VPA is a PAPR specific concept covering several guest/host shared memory areas used to communicate some information with the hypervisor. As a PAPR concept this is really machine specific information, although it is per-cpu, so it doesn't really belong in the core CPU state structure. There's also other information that's per-cpu, but platform/machine specific. So create a (void *)machine_data in PowerPCCPU which can be used by the machine to locate per-cpu data. Intialization, lifetime and cleanup of machine_data is entirely up to the machine type. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org>
2018-06-16pnv_core: Allocate cpu thread objects individuallyDavid Gibson
Currently, we allocate space for all the cpu objects within a single core in one big block. This was copied from an older version of the spapr code and requires some ugly pointer manipulation to extract the individual objects. This design was due to a misunderstanding of qemu lifetime conventions and has already been changed in spapr (in 94ad93bd "spapr_cpu_core: instantiate CPUs separately". Make an equivalent change in pnv_core to get rid of the nasty pointer arithmetic. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-16mos6522: expose mos6522_update_irq() through MOS6522DeviceClassMark Cave-Ayland
In the case where we have an interrupt generated externally from inputs to bits 1 and 2 of port A and/or port B, it is necessary to expose mos6522_update_irq() so it can be called by the interrupt source. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16mac_newworld: add PMU deviceMark Cave-Ayland
The PMU device supercedes the CUDA device found on older New World Macs and is supported by a larger number of guest OSs from OS 9 to OS X 10.5. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16adb: add property to disable direct reg 3 writesMark Cave-Ayland
MacOS 9 has a bug in its PMU driver whereby after configuring the ADB bus devices it sends another write to reg 3 on both devices resetting them both back to the same address. Add a new disable_direct_reg3_writes property to ADBDevice to disable these direct writes which can enabled just for the upcoming pmu-adb support. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16mac_newworld: add gpios to macio devices with PMU enabledMark Cave-Ayland
PMU-enabled New World Macs expose their GPIOs via a separate memory region within the macio device. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16mac_newworld: add via machine option to control mac99 VIA/ADB configurationMark Cave-Ayland
This option allows the VIA configuration to be controlled between 3 different possible setups: cuda, pmu-adb and pmu with USB rather than ADB keyboard/mouse. For the moment we don't do anything with the configuration except to pass it to the macio device (the via-cuda parent) and also to the firmware via the fw_cfg interface so that it can present the correct device tree. The default is cuda which is the current default and so will have no change in behaviour. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-15tcg: remove tb_lockEmilio G. Cota
Use mmap_lock in user-mode to protect TCG state and the page descriptors. In !user-mode, each vCPU has its own TCG state, so no locks needed. Per-page locks are used to protect the page descriptors. Per-TB locks are used in both modes to protect TB jumps. Some notes: - tb_lock is removed from notdirty_mem_write by passing a locked page_collection to tb_invalidate_phys_page_fast. - tcg_tb_lookup/remove/insert/etc have their own internal lock(s), so there is no need to further serialize access to them. - do_tb_flush is run in a safe async context, meaning no other vCPU threads are running. Therefore acquiring mmap_lock there is just to please tools such as thread sanitizer. - Not visible in the diff, but tb_invalidate_phys_page already has an assert_memory_lock. - cpu_io_recompile is !user-only, so no mmap_lock there. - Added mmap_unlock()'s before all siglongjmp's that could be called in user-mode while mmap_lock is held. + Added an assert for !have_mmap_lock() after returning from the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic. Performance numbers before/after: Host: AMD Opteron(tm) Processor 6376 ubuntu 17.04 ppc64 bootup+shutdown time 700 +-+--+----+------+------------+-----------+------------*--+-+ | + + + + + *B | | before ***B*** ** * | |tb lock removal ###D### *** | 600 +-+ *** +-+ | ** # | | *B* #D | | *** * ## | 500 +-+ *** ### +-+ | * *** ### | | *B* # ## | | ** * #D# | 400 +-+ ** ## +-+ | ** ### | | ** ## | | ** # ## | 300 +-+ * B* #D# +-+ | B *** ### | | * ** #### | | * *** ### | 200 +-+ B *B #D# +-+ | #B* * ## # | | #* ## | | + D##D# + + + + | 100 +-+--+----+------+------------+-----------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/HwmBHXe debian jessie aarch64 bootup+shutdown time 90 +-+--+-----+-----+------------+------------+------------+--+-+ | + + + + + + | | before ***B*** B | 80 +tb lock removal ###D### **D +-+ | **### | | **## | 70 +-+ ** # +-+ | ** ## | | ** # | 60 +-+ *B ## +-+ | ** ## | | *** #D | 50 +-+ *** ## +-+ | * ** ### | | **B* ### | 40 +-+ **** # ## +-+ | **** #D# | | ***B** ### | 30 +-+ B***B** #### +-+ | B * * # ### | | B ###D# | 20 +-+ D ##D## +-+ | D# | | + + + + + + | 10 +-+--+-----+-----+------------+------------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/iGpGFtv The gains are high for 4-8 CPUs. Beyond that point, however, unrelated lock contention significantly hurts scalability. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15translate-all: protect TB jumps with a per-destination-TB lockEmilio G. Cota
This applies to both user-mode and !user-mode emulation. Instead of relying on a global lock, protect the list of incoming jumps with tb->jmp_lock. This lock also protects tb->cflags, so update all tb->cflags readers outside tb->jmp_lock to use atomic reads via tb_cflags(). In order to find the destination TB (and therefore its jmp_lock) from the origin TB, we introduce tb->jmp_dest[]. I considered not using a linked list of jumps, which simplifies code and makes the struct smaller. However, it unnecessarily increases memory usage, which results in a performance decrease. See for instance these numbers booting+shutting down debian-arm: Time (s) Rel. err (%) Abs. err (s) Rel. slowdown (%) ------------------------------------------------------------------------------ before 20.88 0.74 0.154512 0. after 20.81 0.38 0.079078 -0.33524904 GTree 21.02 0.28 0.058856 0.67049808 GHashTable + xxhash 21.63 1.08 0.233604 3.5919540 Using a hash table or a binary tree to keep track of the jumps doesn't really pay off, not only due to the increased memory usage, but also because most TBs have only 0 or 1 jumps to them. The maximum number of jumps when booting debian-arm that I measured is 35, but as we can see in the histogram below a TB with that many incoming jumps is extremely rare; the average TB has 0.80 incoming jumps. n_jumps: 379208; avg jumps/tb: 0.801099 dist: [0.0,1.0)|▄█▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁ ▁▁▁ ▁▁▁ ▁|[34.0,35.0] Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15translate-all: introduce assert_no_pages_lockedEmilio G. Cota
The appended adds assertions to make sure we do not longjmp with page locks held. Note that user-mode has nothing to check, since page_locks are !user-mode only. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15translate-all: use per-page locking in !user-modeEmilio G. Cota
Groundwork for supporting parallel TCG generation. Instead of using a global lock (tb_lock) to protect changes to pages, use fine-grained, per-page locks in !user-mode. User-mode stays with mmap_lock. Sometimes changes need to happen atomically on more than one page (e.g. when a TB that spans across two pages is added/invalidated, or when a range of pages is invalidated). We therefore introduce struct page_collection, which helps us keep track of a set of pages that have been locked in the appropriate locking order (i.e. by ascending page index). This commit first introduces the structs and the function helpers, to then convert the calling code to use per-page locking. Note that tb_lock is not removed yet. While at it, rename tb_alloc_page to tb_page_add, which pairs with tb_page_remove. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15translate-all: iterate over TBs in a page with PAGE_FOR_EACH_TBEmilio G. Cota
This commit does several things, but to avoid churn I merged them all into the same commit. To wit: - Use uintptr_t instead of TranslationBlock * for the list of TBs in a page. Just like we did in (c37e6d7e "tcg: Use uintptr_t type for jmp_list_{next|first} fields of TB"), the rationale is the same: these are tagged pointers, not pointers. So use a more appropriate type. - Only check the least significant bit of the tagged pointers. Masking with 3/~3 is unnecessary and confusing. - Introduce the TB_FOR_EACH_TAGGED macro, and use it to define PAGE_FOR_EACH_TB, which improves readability. Note that TB_FOR_EACH_TAGGED will gain another user in a subsequent patch. - Update tb_page_remove to use PAGE_FOR_EACH_TB. In case there is a bug and we attempt to remove a TB that is not in the list, instead of segfaulting (since the list is NULL-terminated) we will reach g_assert_not_reached(). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctxEmilio G. Cota
Thereby making it per-TCGContext. Once we remove tb_lock, this will avoid an atomic increment every time a TB is invalidated. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15tcg: track TBs with per-region BST'sEmilio G. Cota
This paves the way for enabling scalable parallel generation of TCG code. Instead of tracking TBs with a single binary search tree (BST), use a BST for each TCG region, protecting it with a lock. This is as scalable as it gets, since each TCG thread operates on a separate region. The core of this change is the introduction of struct tcg_region_tree, which contains a pointer to a GTree and an associated lock to serialize accesses to it. We then allocate an array of tcg_region_tree's, adding the appropriate padding to avoid false sharing based on qemu_dcache_linesize. Given a tc_ptr, we first find the corresponding region_tree. This is done by special-casing the first and last regions first, since they might be of size != region.size; otherwise we just divide the offset by region.stride. I was worried about this division (several dozen cycles of latency), but profiling shows that this is not a fast path. Note that region.stride is not required to be a power of two; it is only required to be a multiple of the host's page size. Note that with this design we can also provide consistent snapshots about all region trees at once; for instance, tcg_tb_foreach acquires/releases all region_tree locks before/after iterating over them. For this reason we now drop tb_lock in dump_exec_info(). As an alternative I considered implementing a concurrent BST, but this can be tricky to get right, offers no consistent snapshots of the BST, and performance and scalability-wise I don't think it could ever beat having separate GTrees, given that our workload is insert-mostly (all concurrent BST designs I've seen focus, understandably, on making lookups fast, which comes at the expense of convoluted, non-wait-free insertions/removals). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15qht: return existing entry when qht_insert failsEmilio G. Cota
The meaning of "existing" is now changed to "matches in hash and ht->cmp result". This is saner than just checking the pointer value. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>