aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)Author
2016-10-17pc: Register TYPE_PC_MACHINE properties as class propertiesEduardo Habkost
Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17machine: Register TYPE_MACHINE properties as class propertiesEduardo Habkost
When doing the conversion, the NULL errp arguments on the property registration calls were changed to &error_abort. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17intel_iommu: reject broken EIMRadim Krčmář
Cluster x2APIC cannot work without KVM's x2apic API when the maximal APIC ID is greater than 8 and only KVM's LAPIC can support x2APIC, so we forbid other APICs and also the old KVM case with less than 9, to simplify the code. There is no point in enabling EIM in forbidden APICs, so we keep it enabled only for the KVM APIC; unconditionally, because making the option depend on KVM version would be a maintanance burden. Old QEMUs would enable eim whenever intremap was on, which would trick guests into thinking that they can enable cluster x2APIC even if any interrupt destination would get clamped to 8 bits. Depending on your configuration, QEMU could notice that the destination LAPIC is not present and report it with a very non-obvious: KVM: injection failed, MSI lost (Operation not permitted) Or the guest could say something about unexpected interrupts, because clamping leads to aliasing so interrupts were being delivered to incorrect VCPUs. KVM_X2APIC_API is the feature that allows us to enable EIM for KVM. QEMU 2.7 allowed EIM whenever interrupt remapping was enabled. In order to keep backward compatibility, we again allow guests to misbehave in non-obvious ways, and make it the default for old machine types. A user can enable the buggy mode it with "x-buggy-eim=on". Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17intel_iommu: add OnOffAuto intr_eim as "eim" propertyRadim Krčmář
The default (auto) emulates the current behavior. A user can now control EIM like -device intel-iommu,intremap=on,eim=off Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17intel_iommu: redo configuraton check in realizeRadim Krčmář
* there no point in configuring the device if realization is going to fail, so move the check to the beginning, * create a separate function for the check, * use error_setg() instead error_report(). Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17intel_iommu: pass whole remapped addresses to apicRadim Krčmář
The MMIO interface to APIC only allowed 8 bit addresses, which is not enough for 32 bit addresses from EIM remapping. Intel stored upper 24 bits in the high MSI address, so use the same technique. The technique is also used in KVM MSI interface. Other APICs are unlikely to handle those upper bits. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17apic: add send_msi() to APICCommonClassRadim Krčmář
The MMIO based interface to APIC doesn't work well with MSIs that have upper address bits set (remapped x2APIC MSIs). A specialized interface is a quick and dirty way to avoid the shortcoming. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17apic: add global apic_get_class()Radim Krčmář
Every configuration has only up to one APIC class and we'll be extending the class with a function that can be called without an instanced object, so a direct access to the class is convenient. This patch will break compilation if some code uses apic_get_class() with CONFIG_USER_ONLY. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17vfio: fix duplicate function callCao jin
When vfio device is reset(encounter FLR, or bus reset), if need to do bus reset(vfio_pci_hot_reset_one is called), vfio_pci_pre_reset & vfio_pci_post_reset will be called twice. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Fix vfio_rtl8168_quirk_data_read address offsetThorsten Kohfeldt
Introductory comment for rtl8168 VFIO MSI-X quirk states: At BAR2 offset 0x70 there is a dword data register, offset 0x74 is a dword address register. vfio: vfio_bar_read(0000:05:00.0:BAR2+0x70, 4) = 0xfee00398 // read data Thus, correct offset for data read is 0x70, but function vfio_rtl8168_quirk_data_read() wrongfully uses offset 0x74. Signed-off-by: Thorsten Kohfeldt <thorsten.kohfeldt@gmx.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Handle host oversightEric Auger
In case the end-user calls qemu with -vfio-pci option without passing either sysfsdev or host property value, the device is interpreted as 0000:00:00.0. Let's create a specific error message to guide the end-user. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Remove vfio_populate_device returned valueEric Auger
The returned value (either -errno or -1) is not used anymore by the caller, vfio_realize, since the error now is stored in the error object. So let's remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Remove vfio_msix_early_setup returned valueEric Auger
The returned value is not used anymore by the caller, vfio_realize, since the error now is stored in the error object. So let's remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Conversion to realizeEric Auger
This patch converts VFIO PCI to realize function. Also original initfn errors now are propagated using QEMU error objects. All errors are formatted with the same pattern: "vfio: %s: the error description" Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/platform: Pass an error object to vfio_base_device_initEric Auger
This patch propagates errors encountered during vfio_base_device_init up to the realize function. In case the host value is not set or badly formed we now report an error. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/platform: fix a wrong returned value in vfio_populate_deviceEric Auger
In case the vfio_init_intp fails we currently do not return an error value. This patch fixes the bug. The returned value is not explicit but in practice the error object is the one used to report the error to the end-user and the actual returned error value is not used. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/platform: Pass an error object to vfio_populate_deviceEric Auger
Propagate the vfio_populate_device errors up to vfio_base_device_init. The error object also is passed to vfio_init_intp. At the moment we only report the error. Subsequent patches will propagate the error up to the realize function. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio: Pass an error object to vfio_get_deviceEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. In vfio platform vfio_base_device_init we currently just report the error. Subsequent patches will propagate the error up to the realize function. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio: Pass an error object to vfio_get_groupEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. For the time being let's just simply report the error in vfio platform's vfio_base_device_init(). A subsequent patch will duly propagate the error up to vfio_platform_realize. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio: Pass an Error object to vfio_connect_containerEric Auger
The error is currently simply reported in vfio_get_group. Don't bother too much with the prefix which will be handled at upper level, later on. Also return an error value in case container->error is not 0 and the container is teared down. On vfio_spapr_remove_window failure, we also report an error whereas it was silent before. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_pci_igd_opregion_initEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. In vfio_probe_igd_bar4_quirk, simply report the error. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_add_capabilitiesEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. The error is cascaded downto vfio_add_std_cap and then vfio_msi(x)_setup, vfio_setup_pcie_cap. vfio_add_ext_cap does not return anything else than 0 so let's transform it into a void function. Also use pci_add_capability2 which takes an error object. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_intx_enableEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. The error object is propagated down to vfio_intx_enable_kvm(). The three other callers, vfio_intx_enable_kvm(), vfio_msi_disable_common() and vfio_pci_post_reset() do not propagate the error and simply call error_reportf_err() with the ERR_PREFIX formatting. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_msix_early_setupEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. The returned value will be removed later on. We now format an error in case of reading failure for - the MSIX flags - the MSIX table, - the MSIX PBA. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_populate_deviceEric Auger
Pass an error object to prepare for migration to VFIO-PCI realize. The returned value will be removed later on. The case where error recovery cannot be enabled is not converted into an error object but directly reported through error_report, as before. Populating an error instead would cause the future realize function to fail, which is not wanted. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Pass an error object to vfio_populate_vgaEric Auger
Pass an error object to prepare for the same operation in vfio_populate_device. Eventually this contributes to the migration to VFIO-PCI realize. We now report an error on vfio_get_region_info failure. vfio_probe_igd_bar4_quirk is not involved in the migration to realize and simply calls error_reportf_err. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17vfio/pci: Use local error object in vfio_initfnEric Auger
To prepare for migration to realize, let's use a local error object in vfio_initfn. Also let's use the same error prefix for all error messages. On top of the 1-1 conversion, we start using a common error prefix for all error messages. We also introduce a similar warning prefix which will be used later on. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-17Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into stagingPeter Maydell
This pull request contains: - a patch to add a vdc->reset() handler to virtio-9p - a bunch of patches to fix various memory leaks (thanks to Li Qiang) - some code cleanups for 9pfs # gpg: Signature made Mon 17 Oct 2016 16:01:46 BST # gpg: using DSA key 0x02FC3AEB0101DBC2 # gpg: Good signature from "Greg Kurz <groug@kaod.org>" # gpg: aka "Greg Kurz <groug@free.fr>" # gpg: aka "Greg Kurz <gkurz@fr.ibm.com>" # gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>" # gpg: aka "Gregory Kurz (Groug) <groug@free.fr>" # gpg: aka "Gregory Kurz (Cimai Technology) <gkurz@cimai.com>" # gpg: aka "Gregory Kurz (Meiosys Technology) <gkurz@meiosys.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2 * remotes/gkurz/tags/for-upstream: 9pfs: fix memory leak in v9fs_write 9pfs: fix memory leak in v9fs_link 9pfs: fix memory leak in v9fs_xattrcreate 9pfs: fix information leak in xattr read virtio-9p: add reset handler 9pfs: only free completed request if not flushed 9pfs: drop useless check in pdu_free() 9pfs: use coroutine_fn annotation in hw/9pfs/9p.[ch] 9pfs: use coroutine_fn annotation in hw/9pfs/co*.[ch] 9pfs: fsdev: drop useless extern annotation for functions 9pfs: fix potential host memory leak in v9fs_read 9pfs: allocate space for guest originated empty strings Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-179pfs: fix memory leak in v9fs_writeLi Qiang
If an error occurs when marshalling the transfer length to the guest, the v9fs_write() function doesn't free an IO vector, thus leading to a memory leak. This patch fixes the issue. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, rephrased the changelog] Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fix memory leak in v9fs_linkLi Qiang
The v9fs_link() function keeps a reference on the source fid object. This causes a memory leak since the reference never goes down to 0. This patch fixes the issue. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, rephrased the changelog] Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fix memory leak in v9fs_xattrcreateLi Qiang
The 'fs.xattr.value' field in V9fsFidState object doesn't consider the situation that this field has been allocated previously. Every time, it will be allocated directly. This leads to a host memory leak issue if the client sends another Txattrcreate message with the same fid number before the fid from the previous time got clunked. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, updated the changelog to indicate how the leak can occur] Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fix information leak in xattr readLi Qiang
9pfs uses g_malloc() to allocate the xattr memory space, if the guest reads this memory before writing to it, this will leak host heap memory to the guest. This patch avoid this. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17virtio-9p: add reset handlerGreg Kurz
Virtio devices should implement the VirtIODevice->reset() function to perform necessary cleanup actions and to bring the device to a quiescent state. In the case of the virtio-9p device, this means: - emptying the list of active PDUs (i.e. draining all in-flight I/O) - freeing all fids (i.e. close open file descriptors and free memory) That's what this patch does. The reset handler first waits for all active PDUs to complete. Since completion happens in the QEMU global aio context, we just have to loop around aio_poll() until the active list is empty. The freeing part involves some actions to be performed on the backend, like closing file descriptors or flushing extended attributes to the underlying filesystem. The virtfs_reset() function already does the job: it calls free_fid() for all open fids not involved in an ongoing I/O operation. We are sure this is the case since we have drained the PDU active list. The current code implements all backend accesses with coroutines, but we want to stay synchronous on the reset path. We can either change the current code to be able to run when not in coroutine context, or create a coroutine context and wait for virtfs_reset() to complete. This patch goes for the latter because it results in simpler code. Note that we also need to create a dummy PDU because it is also an API to pass the FsContext pointer to all backend callbacks. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-10-179pfs: only free completed request if not flushedGreg Kurz
If a PDU has a flush request pending, the current code calls pdu_free() twice: 1) pdu_complete()->pdu_free() with pdu->cancelled set, which does nothing 2) v9fs_flush()->pdu_free() with pdu->cancelled cleared, which moves the PDU back to the free list. This works but it complexifies the logic of pdu_free(). With this patch, pdu_complete() only calls pdu_free() if no flush request is pending, i.e. qemu_co_queue_next() returns false. Since pdu_free() is now supposed to be called with pdu->cancelled cleared, the check in pdu_free() is dropped and replaced by an assertion. Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: drop useless check in pdu_free()Greg Kurz
Out of the three users of pdu_free(), none ever passes a NULL pointer to this function. Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: use coroutine_fn annotation in hw/9pfs/9p.[ch]Greg Kurz
All these functions either call the v9fs_co_* functions which have the coroutine_fn annotation, or pdu_complete() which calls qemu_co_queue_next(). Let's mark them to make it obvious they execute in coroutine context. Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: use coroutine_fn annotation in hw/9pfs/co*.[ch]Greg Kurz
All these functions use the v9fs_co_run_in_worker() macro, and thus always call qemu_coroutine_self() and qemu_coroutine_yield(). Let's mark them to make it obvious they execute in coroutine context. Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fsdev: drop useless extern annotation for functionsGreg Kurz
Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fix potential host memory leak in v9fs_readLi Qiang
In 9pfs read dispatch function, it doesn't free two QEMUIOVector object thus causing potential memory leak. This patch avoid this. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: allocate space for guest originated empty stringsLi Qiang
If a guest sends an empty string paramater to any 9P operation, the current code unmarshals it into a V9fsString equal to { .size = 0, .data = NULL }. This is unfortunate because it can cause NULL pointer dereference to happen at various locations in the 9pfs code. And we don't want to check str->data everywhere we pass it to strcmp() or any other function which expects a dereferenceable pointer. This patch enforces the allocation of genuine C empty strings instead, so callers don't have to bother. Out of all v9fs_iov_vunmarshal() users, only v9fs_xattrwalk() checks if the returned string is empty. It now uses v9fs_string_size() since name.data cannot be NULL anymore. Signed-off-by: Li Qiang <liqiang6-s@360.cn> [groug, rewritten title and changelog, fix empty string check in v9fs_xattrwalk()] Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-16spapr: Improved placement of PCI host bridges in guest memory mapDavid Gibson
Currently, the MMIO space for accessing PCI on pseries guests begins at 1 TiB in guest address space. Each PCI host bridge (PHB) has a 64 GiB chunk of address space in which it places its outbound PIO and 32-bit and 64-bit MMIO windows. This scheme as several problems: - It limits guest RAM to 1 TiB (though we have a limited fix for this now) - It limits the total MMIO window to 64 GiB. This is not always enough for some of the large nVidia GPGPU cards - Putting all the windows into a single 64 GiB area means that naturally aligning things within there will waste more address space. In addition there was a miscalculation in some of the defaults, which meant that the MMIO windows for each PHB actually slightly overran the 64 GiB region for that PHB. We got away without nasty consequences because the overrun fit within an unused area at the beginning of the next PHB's region, but it's not pretty. This patch implements a new scheme which addresses those problems, and is also closer to what bare metal hardware and pHyp guests generally use. Because some guest versions (including most current distro kernels) can't access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and 64 TiB. This is broken into 1 TiB chunks. The first 1 TiB contains the PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs. Each subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for one PHB each. This reduces the number of allowed PHBs (without full manual configuration of all the windows) from 256 to 31, but this should still be plenty in practice. We also change some of the default window sizes for manually configured PHBs to saner values. Finally we adjust some tests and libqos so that it correctly uses the new default locations. Ideally it would parse the device tree given to the guest, but that's a more complex problem for another time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16spapr_pci: Add a 64-bit MMIO windowDavid Gibson
On real hardware, and under pHyp, the PCI host bridges on Power machines typically advertise two outbound MMIO windows from the guest's physical memory space to PCI memory space: - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space - A 64-bit window which maps onto a large region somewhere high in PCI address space (traditionally this used an identity mapping from guest physical address to PCI address, but that's not always the case) The qemu implementation in spapr-pci-host-bridge, however, only supports a single outbound MMIO window, however. At least some Linux versions expect the two windows however, so we arranged this window to map onto the PCI memory space from 2 GiB..~64 GiB, then advertised it as two contiguous windows, the "32-bit" window from 2G..4G and the "64-bit" window from 4G..~64G. This approach means, however, that the 64G window is not naturally aligned. In turn this limits the size of the largest BAR we can map (which does have to be naturally aligned) to roughly half of the total window. With some large nVidia GPGPU cards which have huge memory BARs, this is starting to be a problem. This patch adds true support for separate 32-bit and 64-bit outbound MMIO windows to the spapr-pci-host-bridge implementation, each of which can be independently configured. The 32-bit window always maps to 2G.. in PCI space, but the PCI address of the 64-bit window can be configured (it defaults to the same as the guest physical address). So as not to break possible existing configurations, as long as a 64-bit window is not specified, a large single window can be specified. This will appear the same way to the guest as the old approach, although it's now implemented by two contiguous memory regions rather than a single one. For now, this only adds the possibility of 64-bit windows. The default configuration still uses the legacy mode. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16spapr: Adjust placement of PCI host bridge to allow > 1TiB RAMDavid Gibson
Currently the default PCI host bridge for the 'pseries' machine type is constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in guest memory space. This means that if > 1TiB of guest RAM is specified, the RAM will collide with the PCI IO windows, causing serious problems. Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because there's a little unused space at the bottom of the area reserved for PCI, but essentially this means that > 1TiB of RAM has never worked with the pseries machine type. This patch fixes this by altering the placement of PHBs on large-RAM VMs. Instead of always placing the first PHB at 1TiB, it is placed at the next 1 TiB boundary after the maximum RAM address. Technically, this changes behaviour in a migration-breaking way for existing machines with > 1TiB maximum memory, but since having > 1 TiB memory was broken anyway, this seems like a reasonable trade-off. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16spapr_pci: Delegate placement of PCI host bridges to machine typeDavid Gibson
The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB) for a PAPR guest. Unlike on x86, it's routine on Power (both bare metal and PAPR guests) to have numerous independent PHBs, each controlling a separate PCI domain. There are two ways of configuring the spapr-pci-host-bridge device: first it can be done fully manually, specifying the locations and sizes of all the IO windows. This gives the most control, but is very awkward with 6 mandatory parameters. Alternatively just an "index" can be specified which essentially selects from an array of predefined PHB locations. The PHB at index 0 is automatically created as the default PHB. The current set of default locations causes some problems for guests with large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia GPGPU cards via VFIO). Obviously, for migration we can only change the locations on a new machine type, however. This is awkward, because the placement is currently decided within the spapr-pci-host-bridge code, so it breaks abstraction to look inside the machine type version. So, this patch delegates the "default mode" PHB placement from the spapr-pci-host-bridge device back to the machine type via a public method in sPAPRMachineClass. It's still a bit ugly, but it's about the best we can do. For now, this just changes where the calculation is done. It doesn't change the actual location of the host bridges, or any other behaviour. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-14ppc/xics: Split ICS into ics-base and ics classBenjamin Herrenschmidt
The existing implementation remains same and ics-base is introduced. The type name "ics" is retained, and all the related functions renamed as ics_simple_* This will allow different implementations for the source controllers such as the MSI support of PHB3 on Power8 which uses in-memory state tables for example. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ clg: added ICS_BASE_GET_CLASS and related fixes, based on : http://patchwork.ozlabs.org/patch/646010/ ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14ppc/xics: Make the ICSState a listBenjamin Herrenschmidt
Instead of an array of fixed sized blocks, use a list, as we will need to have sources with variable number of interrupts. SPAPR only uses a single entry. Native will create more. If performance becomes an issue we can add some hashed lookup but for now this will do fine. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [ move the initialization of list to xics_common_initfn, restore xirr_owner after migration and move restoring to icp_post_load] Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> [ clg: removed the icp_post_load() changes from nikunj patchset v3: http://patchwork.ozlabs.org/patch/646008/ ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14spapr: fix inheritance chain for default machine optionsMichael Roth
Rather than machine instances having backward-compatible option defaults that need to be repeatedly re-enabled for every new machine type we introduce, we set the defaults appropriate for newer machine types, then add code to explicitly disable instance options as needed to maintain compatibility with older machine types. Currently pseries-2.5 does not inherit from pseries-2.6 in this fashion, which is okay at the moment since we do not have any instance compatibility options for pseries-2.6+ currently. We will make use of this in future patches though, so fix it here. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> [dwg: Extended to make 2.7 inherit from 2.8 as well] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-12usb-redir: allocate buffers before waking up the host adapterHans de Goede
Needed to make sure usb redirection is prepared to actually handle the callback from the usb host adapter. Without this interrupt endpoints don't work on xhci. Note: On ehci the usb_wakeup() call only schedules a BH for the actual work, which hides this bug because the allocation happens before ehci calls back even without this patch. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Message-id: 1476096313-7730-1-git-send-email-kraxel@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-10-12usb: Fix incorrect default DMA offset.Vijay Kumar B
The default DMA offset is set to 3. When the property is not set by the consumer, the default causes DMA access to be shifted by 3 bytes. In PXA, this results in incorrect DMA access, leading to error notification in the USB controller driver. A better default would be 0, so that there is no offset, when the consumer does not specify one. Signed-off-by: Vijay Kumar B. <vijaykumar@zilogic.com> Reviewed-by: Deepak S. <deepak@zilogic.com> Message-id: 1475060958-7760-1-git-send-email-vijaykumar@zilogic.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-10-12usb: fix serial generatorGerd Hoffmann
snprintf return value is *not* the number of chars written into the buffer, but the number of chars needed. So in case the buffer is too small you can go alloc a bigger one and try again. But that also means you can't simply use the return value for the next snprintf call without checking beforehand that things did actually fit. Problem is that usb_desc_create_serial didn't perform that check, so a loooong path string (can happen with deep pci-bridge nesting) results in the third snprintf call smashing the stack. Fix this by throwing out all the snpintf calls and use g_strdup_printf instead. https://bugzilla.redhat.com/show_bug.cgi?id=1381630 Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1475659998-22045-1-git-send-email-kraxel@redhat.com