aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)Author
2019-02-26spapr_pci: add ibm, my-drc-index property for PHB hotplugMichael Roth
This is needed to denote a boot-time PHB as being hot-pluggable. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059672420.1466090.15147504040270659866.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr_pci: provide node start offset via spapr_populate_pci_dt()Michael Roth
PHB hotplug re-uses PHB device tree generation code and passes it to a guest via RTAS. Doing this requires knowledge of where exactly in the device tree the node describing the PHB begins. Provide this via a new optional pointer that can be used to store the PHB node's start offset. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059671912.1466090.10891589403973703473.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr_events: add support for phb hotplug eventsMichael Roth
Extend the existing EPOW event format we use for PCI devices to emit PHB plug/unplug events. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059671405.1466090.535964535260503283.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr: populate PHB DRC entries for root DT nodeNathan Fontenot
This add entries to the root OF node to advertise our PHBs as being DR-capable in accordance with PAPR specification. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059670897.1466090.10843921337591637414.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr: create DR connectors for PHBsMichael Roth
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059670389.1466090.10015601248906623076.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr_pci: add PHB unrealizeGreg Kurz
To support PHB hotplug we need to clean up lingering references, memory, child properties, etc. prior to the PHB object being finalized. Generally this will be called as a result of calling object_unparent() on the PHB object, which in turn would normally be called as the result of an unplug() operation. When the PHB is finalized, child objects will be unparented in turn, and finalized if the PHB was the only reference holder. so we don't bother to explicitly unparent child objects of the PHB, with the notable exception of DRCs. This is needed to avoid a QEMU crash when unplugging a PHB and resetting the machine before the guest could handle the event. The DRCs are removed from the QOM tree by pci_unregister_root_bus() and we must make sure we're not leaving stale aliases under the global /dr-connector path. The formula that gives the number of DMA windows is moved to an inline function in the hw/pci-host/spapr.h header because it will have other users. The unrealize function is able to cope with partially realized PHBs. It is hence used to implement proper rollback on the realize error path. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <155059669881.1466090.13515030705986041517.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr_irq: Expose the phandle of the interrupt controllerGreg Kurz
This will be used by PHB hotplug in order to create the "interrupt-map" property of the PHB node. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059669374.1466090.12943228478046223856.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr: Expose the name of the interrupt controller nodeGreg Kurz
This will be needed by PHB hotplug in order to access the "phandle" property of the interrupt controller node. Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <155059668867.1466090.6339199751719123386.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26xics: Write source state to KVM at claim timeGreg Kurz
The pseries machine only uses LSIs to support legacy PCI devices. Every PHB claims 4 LSIs at realize time. When using in-kernel XICS (or upcoming in-kernel XIVE), QEMU synchronizes the state of all irqs, including these LSIs, later on at machine reset. In order to support PHB hotplug, we need a way to tell KVM about the LSIs that doesn't require a machine reset. An easy way to do that is to always inform KVM when an interrupt is claimed, which really isn't a performance path. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059668360.1466090.5969630516627776426.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr/drc: Drop spapr_drc_attach() fdt argumentGreg Kurz
All DRC subtypes have been converted to generate the FDT fragment at configure connector time instead of attach time. The fdt and fdt_offset arguments of spapr_drc_attach() aren't needed anymore. Drop them and make the implementation of the dt_populate() method mandatory. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059667853.1466090.16527852453054217565.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr/pci: Generate FDT fragment at configure connector timeGreg Kurz
Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059667346.1466090.326696113231137772.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr: Generate FDT fragment for CPUs at configure connector timeGreg Kurz
Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059666839.1466090.3833376527523126752.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr: Generate FDT fragment for LMBs at configure connector timeGreg Kurz
Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059666331.1466090.6766540766297333313.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr_drc: Allow FDT fragment to be added laterGreg Kurz
The current logic is to provide the FDT fragment when attaching a device to a DRC. This works perfectly fine for our current hotplug support, but soon we will add support for PHB hotplug which has some constraints, that CPU, PCI and LMB devices don't seem to have. The first constraint is that the "ibm,dma-window" property of the PHB node requires the IOMMU to be configured, ie, spapr_tce_table_enable() has been called, which happens during PHB reset. It is okay in the case of hotplug since the device is reset before the hotplug handler is called. On the contrary with coldplug, the hotplug handler is called first and device is only reset during the initial system reset. Trying to create the FDT fragment on the hotplug path in this case, would result in somthing like this: ibm,dma-window = < 0x80000000 0x00 0x00 0x00 0x00 >; This will cause linux in the guest to panic, by simply removing and re-adding the PHB using the drmgr command: page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz)); if (!page) panic("iommu_init_table: Can't allocate %ld bytes\n", sz); The second and maybe more problematic constraint is that the "interrupt-map" property needs to reference the interrupt controller node using the very same phandle that SLOF has already exposed to the guest. QEMU requires SLOF to call the private KVMPPC_H_UPDATE_DT hcall at some point to know about this phandle. With the latest QEMU and SLOF, this happens when SLOF gets quiesced. This means that if the PHB gets hotplugged after CAS but before SLOF quiesce, then we're sure that the phandle is not known when the hotplug handler is called. The FDT is only needed when the guest first invokes RTAS to configure the connector actually, long after SLOF quiesce. Let's postpone the creation of FDT fragments for PHBs to rtas_ibm_configure_connector(). Since we only need this for PHBs, introduce a new method in the base DRC class for that. DRC subtypes will be converted to use it in subsequent patches. Allow spapr_drc_attach() to be passed a NULL fdt argument if the method is available. When all DRC subtypes have been converted, the fdt argument will eventually disappear. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059665823.1466090.18358845122627355537.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26target/ppc: Rename PATB/PATBE -> PATEBenjamin Herrenschmidt
That "b" means "base address" and thus shouldn't be in the name of actual entries and related constants. This patch keeps the synthetic patb_entry field of the spapr virtual hypervisor unchanged until I figure out if that has an impact on the migration stream. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-11-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26target/ppc: Fix ordering of hash MMU accessesBenjamin Herrenschmidt
With mttcg, we can have MMU lookups happening at the same time as the guest modifying the page tables. Since the HPTEs of the hash table MMU contains two words (or double worlds on 64-bit), we need to make sure we read them in the right order, with the correct memory barrier. Additionally, when using emulated SPAPR mode, the hypercalls writing to the hash table must also perform the udpates in the right order. Note: This part is still not entirely correct Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-7-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26target/ppc/spapr: Set LPCR:HR when using Radix modeBenjamin Herrenschmidt
The HW relies on LPCR:HR along with the PATE to determine whether to use Radix or Hash mode. In fact it uses LPCR:HR more commonly than the PATE. For us, it's also more efficient to do so, especially since unlike the HW we do not maintain a cache of the current PATE and HV PATE in a generic place. Prepare the grounds for that by ensuring that LPCR:HR is set properly on SPAPR machines. Another option would have been to use a callback to get the PATE but this gets messy when implementing bare metal support, it's much simpler (and faster) to use LPCR. Since existing migration streams may not have it, fix it up in spapr_post_load() as well based on the pseudo-PATE entry that we keep. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr: support memory unplug for qtestDavid Hildenbrand
Fake availability of OV5_HP_EVT, so we can test memory unplug in qtest. Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190218092202.26683-3-david@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26ppc: add host-serial and host-model machine attributes (CVE-2019-8934)Prasad J Pandit
On ppc hosts, hypervisor shares following system attributes - /proc/device-tree/system-id - /proc/device-tree/model with a guest. This could lead to information leakage and misuse.[*] Add machine attributes to control such system information exposure to a guest. [*] https://wiki.openstack.org/wiki/OSSN/OSSN-0028 Reported-by: Daniel P. Berrangé <berrange@redhat.com> Fix-suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20190218181349.23885-1-ppandit@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26target/ppc: Add POWER9 external interrupt modelBenjamin Herrenschmidt
Adds support for the Hypervisor directed interrupts in addition to the OS ones. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: - modified the icp_realize() and xive_tctx_realize() to take into account explicitely the POWER9 interrupt model - introduced a specific power9_set_irq for POWER9 ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215161648.9600-10-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26target/ppc: Rename "in_pm_state" to "resume_as_sreset"Benjamin Herrenschmidt
To better reflect what this does, as it's specific to some of the P7/P8/P9 PM states, not generic. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190215161648.9600-6-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-25Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into ↵Peter Maydell
staging Pull request # gpg: Signature made Fri 22 Feb 2019 14:07:01 GMT # gpg: using RSA key 9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: (27 commits) tests/virtio-blk: add test for DISCARD command tests/virtio-blk: add test for WRITE_ZEROES command tests/virtio-blk: add virtio_blk_fix_dwz_hdr() function tests/virtio-blk: change assert on data_size in virtio_blk_request() virtio-blk: add DISCARD and WRITE_ZEROES features virtio-blk: set config size depending on the features enabled virtio-net: make VirtIOFeature usable for other virtio devices virtio-blk: add "discard" and "write-zeroes" properties virtio-blk: add host_features field in VirtIOBlock virtio-blk: add acct_failed param to virtio_blk_handle_rw_error() hw/ide: drop iov field from IDEDMA hw/ide: drop iov field from IDEBufferedRequest hw/ide: drop iov field from IDEState tests/test-bdrv-drain: use QEMU_IOVEC_INIT_BUF migration/block: use qemu_iovec_init_buf qemu-img: use qemu_iovec_init_buf block/vmdk: use qemu_iovec_init_buf block/qed: use qemu_iovec_init_buf block/qcow2: use qemu_iovec_init_buf block/qcow: use qemu_iovec_init_buf ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-02-25virtio-blk: Increase in_flight for request restart BHKevin Wolf
virtio_blk_dma_restart_bh() submits new requests, so in order to make sure that these requests are not started inside a drained section of the attached BlockBackend, we need to make sure that draining the BlockBackend waits for the BH to be executed. This BH is still questionable because its scheduled in the main thread instead of the configured iothread. Leave a FIXME comment for this. But with this fix, enabling the data plane at least waits for these requests (in bdrv_set_aio_context()) instead of changing the AioContext under their feet and making them run in the wrong thread, causing crashes and failures (e.g. due to missing locking). Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25Merge remote-tracking branch 'remotes/kraxel/tags/vga-20190222-pull-request' ↵Peter Maydell
into staging vga: bugfixes and edid support for virtio-vga # gpg: Signature made Fri 22 Feb 2019 08:24:25 GMT # gpg: using RSA key 4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full] # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" [full] # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full] # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/vga-20190222-pull-request: display/virtio: add edid support. virtio-gpu: remove useless 'waiting' field virtio-gpu: block both 2d and 3d rendering virtio-gpu: remove unused config_size virtio-gpu: remove unused qdev Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-02-25Merge remote-tracking branch 'remotes/kraxel/tags/ui-20190222-pull-request' ↵Peter Maydell
into staging ui: add support for -display spice-app ui: gtk+sdl bugfixes. # gpg: Signature made Fri 22 Feb 2019 07:53:13 GMT # gpg: using RSA key 4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full] # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" [full] # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full] # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/ui-20190222-pull-request: display: add -display spice-app launching a Spice client spice: use a default name for the server qapi: document DisplayType enum build-sys: add gio-2.0 check char: register spice ports after spice started char: move SpiceChardev and open_spice_port() to spice.h header spice: do not stop spice if VM is paused spice: merge options lists spice: avoid spice runtime assert char/spice: discard write() if backend is disconnected char/spice: trigger HUP event ui/gtk: Fix the license information sdl2: drop qemu_input_event_send_key_qcode call spice: set device address and device display ID in QXL interface kbd-state: don't block auto-repeat events Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-02-24tpm_tis: convert tpm_tis_show_buffer() to use trace eventLiam Merwick
cppcheck reports: [hw/tpm/tpm_tis.c:113]: (warning) %d in format string (no. 2) requires 'int' but the argument type is 'unsigned int' Rather than just converting the format specifier to use '%u", the tpm_tis_show_buffer() function is converted to use trace points and the two debug callers use the trace event infrastructure so that it's available in production cases also and not just when DEBUG_TIS is enabled. Signed-off-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
2019-02-24tpm_tis: fix loop that cancels any seizure by a lower localityLiam Merwick
In tpm_tis_mmio_write() if the requesting locality is seizing access, any seizure by a lower locality is cancelled. However the loop doing the seizure had an off-by-one error and the locality immediately preceding the requesting locality was not being cleared. This is fixed by adjusting the test in the for loop to check the localities up to the requesting locality. Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
2019-02-22pci: Sanity test minimum downstream LNKSTAAlex Williamson
The entire link status register for SR-IOV VFs is defined as RsvdZ, reads simply return zero. Usually this is nothing more than lspci reporting inconsequentially broken values: LnkSta: Speed unknown, Width x0, ... However, now that we're using the downstream endpoint link status to fill in the value at the parent downstream port, invalid values become a problem. In particular, the PCIe hotplug driver in Linux looks for a valid negotiated link width and will fail to enumerate hot-added downstream endpoints without non-zero value here, ex: pciehp 0000:00:02.0:pcie004: Slot(0): Attention button pressed pciehp 0000:00:02.0:pcie004: Slot(0) Powering on due to button press pciehp 0000:00:02.0:pcie004: Slot(0): Card present pciehp 0000:00:02.0:pcie004: Slot(0): Link Up pciehp 0000:00:02.0:pcie004: link training error: status 0x2000 pciehp 0000:00:02.0:pcie004: Failed to check link status Resolve by using minimum width and speed values for the downstream port link status when the endpoint fails to provide valid values. Long term, we may want to implement emulation in the vfio-pci host driver to suppliment this field with the PF value as the SR-IOV spec seems to allow, but the solution here is compatible should that be implemented later. Fixes: 727b48661f75 ("pci: Sync PCIe downstream port LNKSTA on read") Reported-by: Jens Freimann <jfreimann@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <155060310248.19547.14979269067689441201.stgit@gimli.home> Tested-by: Jens Freimann <jfreimann@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-22hw/smbios: fix offset of type 3 sku fieldDaniel P. Berrangé
The type 3 SMBIOS structure[1] ends with fields ... 0x14 - contained element count 0x15 - contained element record length 0x16 - sku number The smbios_type_3 struct missed the contained element record length field, causing sku number to be reported at the wrong offset. [1] https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.1.1.pdf Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20190215153600.1770727-1-berrange@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Fixes: e41fca3da72 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-22pci: Move NVIDIA vendor id to the rest of idsAlexey Kardashevskiy
sPAPR code will use it too so move it from VFIO to the common code. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190214051440.59167-1-aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-22virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page sizeDavid Gibson
The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but we can only actually discard memory in units of the host page size. Now, we handle this very badly: we silently ignore balloon requests that aren't host page aligned, and for requests that are host page aligned we discard the entire host page. The latter can corrupt guest memory if its page size is smaller than the host's. The obvious choice would be to disable the balloon if the host page size is not 4kiB. However, that would break the special case where host and guest have the same page size, but that's larger than 4kiB. That case currently works by accident[1] - and is used in practice on many production POWER systems where 64kiB has long been the Linux default page size on both host and guest. To make the balloon safe, without breaking that useful special case, we need to accumulate 4kiB balloon requests until we have a whole contiguous host page to discard. We could in principle do that across all guest memory, but it would require a large bitmap to track. This patch represents a compromise: we track ballooned subpages for a single contiguous host page at a time. This means that if the guest discards all 4kiB chunks of a host page in succession, we will discard it. This is the expected behaviour in the (host page) == (guest page) != 4kiB case we want to support. If the guest scatters 4kiB requests across different host pages, we don't discard anything, and issue a warning. Not ideal, but at least we don't corrupt guest memory as the previous version could. Warning reporting is kind of a compromise here. Determining whether we're in a problematic state at realize() time is tricky, because we'd have to look at the host pagesizes of all memory backends, but we can't really know if some of those backends could be for special purpose memory that's not subject to ballooning. Reporting only when the guest tries to balloon a partial page also isn't great because if the guest page size happens to line up it won't indicate that we're in a non ideal situation. It could also cause alarming repeated warnings whenever a migration is attempted. So, what we do is warn the first time the guest attempts balloon a partial host page, whether or not it will end up ballooning the rest of the page immediately afterwards. [1] Because when the guest attempts to balloon a page, it will submit requests for each 4kiB subpage. Most will be ignored, but the one which happens to be host page aligned will discard the whole lot. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190214043916.22128-6-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-22Merge remote-tracking branch 'remotes/awilliam/tags/vfio-updates-20190221.0' ↵Peter Maydell
into staging VFIO updates 2019-02-21 - Workaround kernel overflow bug in vfio type1 DMA unmap (Alex Williamson) - Refactor vfio container initialization (Eric Auger) # gpg: Signature made Fri 22 Feb 2019 05:21:07 GMT # gpg: using RSA key 239B9B6E3BB08B22 # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" [full] # gpg: aka "Alex Williamson <alex@shazbot.org>" [full] # gpg: aka "Alex Williamson <alwillia@redhat.com>" [full] # gpg: aka "Alex Williamson <alex.l.williamson@gmail.com>" [full] # Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B 8A90 239B 9B6E 3BB0 8B22 * remotes/awilliam/tags/vfio-updates-20190221.0: hw/vfio/common: Refactor container initialization vfio/common: Work around kernel overflow bug in DMA unmap Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-02-22Merge remote-tracking branch 'remotes/rth/tags/pull-hppa-20190221' into stagingPeter Maydell
Fix dino pci config access. # gpg: Signature made Thu 21 Feb 2019 19:03:26 GMT # gpg: using RSA key 64DF38E8AF7E215F # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-hppa-20190221: hw/hppa/dino: mask out lower 2 bits of PCI config addr Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-02-22Merge remote-tracking branch ↵Peter Maydell
'remotes/amarkovic/tags/mips-queue-feb-21-2019-v2' into staging MIPS queue for February 21st, 2019, v2 # gpg: Signature made Thu 21 Feb 2019 18:37:04 GMT # gpg: using RSA key D4972A8967F75A65 # gpg: Good signature from "Aleksandar Markovic <amarkovic@wavecomp.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 8526 FBF1 5DA3 811F 4A01 DD75 D497 2A89 67F7 5A65 * remotes/amarkovic/tags/mips-queue-feb-21-2019-v2: target/mips: fulong2e: Dynamically generate SPD EEPROM data target/mips: fulong2e: Fix bios flash size hw/pci-host/bonito.c: Add PCI mem region mapped at the correct address target/mips: implement QMP query-cpu-definitions command tests/tcg: target/mips: Add wrappers for MSA integer compare instructions tests/tcg: target/mips: Change directory name 'bit-counting' to 'bit-count' tests/tcg: target/mips: Correct path to headers in some test source files hw/misc: mips_itu: Fix 32/64 bit issue in a line involving shift operator Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-02-22virtio-blk: add DISCARD and WRITE_ZEROES featuresStefano Garzarella
This patch adds the support of DISCARD and WRITE_ZEROES commands, that have been introduced in the virtio-blk protocol to have better performance when using SSD backend. We support only one segment per request since multiple segments are not widely used and there are no userspace APIs that allow applications to submit multiple segments in a single call. Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20190221103314.58500-7-sgarzare@redhat.com Message-Id: <20190221103314.58500-7-sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22virtio-blk: set config size depending on the features enabledStefano Garzarella
Starting from DISABLE and WRITE_ZEROES features, we use an array of VirtIOFeature (as virtio-net) to properly set the config size depending on the features enabled. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20190221103314.58500-6-sgarzare@redhat.com Message-Id: <20190221103314.58500-6-sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22virtio-net: make VirtIOFeature usable for other virtio devicesStefano Garzarella
In order to use VirtIOFeature also in other virtio devices, we move its declaration and the endof() macro (renamed in virtio_endof()) in virtio.h. We add virtio_feature_get_config_size() function to iterate the array of VirtIOFeature and to return the config size depending on the features enabled. (as virtio_net_set_config_size() did) Suggested-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20190221103314.58500-5-sgarzare@redhat.com Message-Id: <20190221103314.58500-5-sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22virtio-blk: add "discard" and "write-zeroes" propertiesStefano Garzarella
In order to avoid migration issues, we enable DISCARD and WRITE_ZEROES features only for machine type >= 4.0 As discussed with Michael S. Tsirkin and Stefan Hajnoczi on the list [1], DISCARD operation should not have security implications (eg. page cache attacks), so we can enable it by default. [1] https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg00504.html Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20190221103314.58500-4-sgarzare@redhat.com Message-Id: <20190221103314.58500-4-sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22virtio-blk: add host_features field in VirtIOBlockStefano Garzarella
Since configurable features for virtio-blk are growing, this patch adds host_features field in the struct VirtIOBlock. (as in virtio-net) In this way, we can avoid to add new fields for new properties and we can directly set VIRTIO_BLK_F* flags in the host_features. We update "config-wce" and "scsi" property definition to use the new host_features field without change the behaviour. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20190221103314.58500-3-sgarzare@redhat.com Message-Id: <20190221103314.58500-3-sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22virtio-blk: add acct_failed param to virtio_blk_handle_rw_error()Stefano Garzarella
We add acct_failed param in order to use virtio_blk_handle_rw_error() also when is not required to call block_acct_failed(). (eg. a discard operation is failed) Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20190221103314.58500-2-sgarzare@redhat.com Message-Id: <20190221103314.58500-2-sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22hw/ide: drop iov field from IDEDMAVladimir Sementsov-Ogievskiy
@iov is used only to initialize @qiov. Let's use new qemu_iovec_init_buf() instead, which simplifies the code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190218140926.333779-18-vsementsov@virtuozzo.com Message-Id: <20190218140926.333779-18-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22hw/ide: drop iov field from IDEBufferedRequestVladimir Sementsov-Ogievskiy
@iov is used only to initialize @qiov. Let's use new qemu_iovec_init_buf() instead, which simplifies the code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190218140926.333779-17-vsementsov@virtuozzo.com Message-Id: <20190218140926.333779-17-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22hw/ide: drop iov field from IDEStateVladimir Sementsov-Ogievskiy
@iov is used only to initialize @qiov. Let's use new qemu_iovec_init_buf() instead, which simplifies the code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190218140926.333779-16-vsementsov@virtuozzo.com Message-Id: <20190218140926.333779-16-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22display/virtio: add edid support.Gerd Hoffmann
This patch adds EDID support to the family of virtio-gpu devices. It is turned off by default, use the new edid property to enable it. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Message-id: 20190221081054.13853-1-kraxel@redhat.com
2019-02-22virtio-gpu: remove useless 'waiting' fieldMarc-André Lureau
Let's check renderer_blocked instead directly. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Christophe Fergeau <cfergeau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190221114330.17968-5-marcandre.lureau@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2019-02-22virtio-gpu: block both 2d and 3d renderingMarc-André Lureau
Now that 2d commands are translated to 3d rendering, qemu must stop sending 3d updates (from 2d) to Spice as well. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1674324 Cc: cfergeau@redhat.com Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Christophe Fergeau <cfergeau@redhat.com> Tested-by: Christophe Fergeau <cfergeau@redhat.com> Message-id: 20190221114330.17968-4-marcandre.lureau@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2019-02-22virtio-gpu: remove unused config_sizeMarc-André Lureau
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Christophe Fergeau <cfergeau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190221114330.17968-3-marcandre.lureau@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2019-02-22virtio-gpu: remove unused qdevMarc-André Lureau
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Christophe Fergeau <cfergeau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190221114330.17968-2-marcandre.lureau@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2019-02-21hw/vfio/common: Refactor container initializationEric Auger
We introduce the vfio_init_container_type() helper. It computes the highest usable iommu type and then set the container and the iommu type. Its usage in vfio_connect_container() makes the code ready for addition of new iommu types. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-21vfio/common: Work around kernel overflow bug in DMA unmapAlex Williamson
A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which adds a test for address space wrap-around in the vfio DMA unmap path. Unfortunately due to overflow, the kernel detects an unmap of the last page in the 64-bit address space as a wrap-around. In QEMU, a Q35 guest with VT-d emulation and guest IOMMU enabled will attempt to make such an unmap request during VM system reset, triggering an error: qemu-kvm: VFIO_UNMAP_DMA: -22 qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument) Here the IOVA start address (0xfef00000) and the size parameter (0xffffffff01100000) add to exactly 2^64, triggering the bug. A kernel fix is queued for the Linux v5.0 release to address this. This patch implements a workaround to retry the unmap, excluding the final page of the range when we detect an unmap failing which matches the requirements for this issue. This is expected to be a safe and complete workaround as the VT-d address space does not extend to the full 64-bit space and therefore the last page should never be mapped. This workaround can be removed once all kernels with this bug are sufficiently deprecated. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291 Reported-by: Pei Zhang <pezhang@redhat.com> Debugged-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>