aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-10-17intel_iommu: add OnOffAuto intr_eim as "eim" propertyRadim Krčmář
The default (auto) emulates the current behavior. A user can now control EIM like -device intel-iommu,intremap=on,eim=off Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17intel_iommu: redo configuraton check in realizeRadim Krčmář
* there no point in configuring the device if realization is going to fail, so move the check to the beginning, * create a separate function for the check, * use error_setg() instead error_report(). Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17intel_iommu: pass whole remapped addresses to apicRadim Krčmář
The MMIO interface to APIC only allowed 8 bit addresses, which is not enough for 32 bit addresses from EIM remapping. Intel stored upper 24 bits in the high MSI address, so use the same technique. The technique is also used in KVM MSI interface. Other APICs are unlikely to handle those upper bits. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17apic: add send_msi() to APICCommonClassRadim Krčmář
The MMIO based interface to APIC doesn't work well with MSIs that have upper address bits set (remapped x2APIC MSIs). A specialized interface is a quick and dirty way to avoid the shortcoming. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17apic: add global apic_get_class()Radim Krčmář
Every configuration has only up to one APIC class and we'll be extending the class with a function that can be called without an instanced object, so a direct access to the class is convenient. This patch will break compilation if some code uses apic_get_class() with CONFIG_USER_ONLY. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17target-i386: Move warning code outside x86_cpu_filter_features()Eduardo Habkost
x86_cpu_filter_features() will be reused by code that shouldn't print any warning. Move the warning code to a new x86_cpu_report_filtered_features() function, and call it from x86_cpu_realizefn(). Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17qmp: Add runnability information to query-cpu-definitionsEduardo Habkost
Add a new optional field to query-cpu-definitions schema: "unavailable-features". It will contain a list of QOM properties that prevent the CPU model from running in the current host. Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: Michael Mueller <mimu@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Jiri Denemark <jdenemar@redhat.com> Cc: libvir-list@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17target-i386: xsave: Add FP and SSE bits to x86_ext_save_areasEduardo Habkost
Instead of treating the FP and SSE bits as special cases, add them to the x86_ext_save_areas array. This will simplify the code that calculates the supported xsave components and the size of the xsave area. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17target-i386: Register properties for feature aliases manuallyEduardo Habkost
Instead of keeping the aliases inside the feature name arrays and require parsing the strings, just register alias properties manually. This simplifies the code for property registration and lookup. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17target-i386: Remove underscores from feat_names arraysEduardo Habkost
Instead of translating the feature name entries when adding property names, store the actual property names in the feature name array. For reference, here is the full list of functions that use FeatureWordInfo::feat_names: * x86_cpu_get_migratable_flags(): not affected, as it just check for non-NULL values. * report_unavailable_features(): informative only. It will start printing feature names with hyphens. * x86_cpu_list(): informative only. It will start printing feature names with hyphens * x86_cpu_register_feature_bit_props(): not affected, as it was already calling feat2prop(). Now we can remove the feat2prop() calls safely. So, the only user-visible effect of this patch are the new names being used in help and error messages for users. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17target-i386: Make plus_features/minus_features QOM-basedEduardo Habkost
Instead of using custom feature name lookup code for plus_features/minus_features, save the property names used in "[+-]feature" and use object_property_set_bool() to set them. We don't need a feat2prop() call because we now have alias properties for the old names containing underscores. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17target-i386: Register aliases for feature names with underscoresEduardo Habkost
Registering the actual names containing underscores as aliases will allow management software to be aware that the old compatibility names are suported, and will make feat2prop() calls unnecessary when using feature names. Also, this will help us avoid making the code support underscores on feature names that never had them in the first place. e.g. "+tsc_deadline" was never supported and doesn't need to be translated to "+tsc-deadline". In other word: this will require less magic translation of strings, and simple 1:1 match between the config options and actual QOM properties. Note that the underscores are still present in the FeatureWordInfo::feat_names arrays, because add_flagname_to_bitmaps() needs them to be kept. The next patches will remove add_flagname_to_bitmaps() and will allow us to finally remove the aliases from feat_names. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17target-i386: Disable VME by default with TCGEduardo Habkost
VME is already disabled automatically when using TCG. So, instead of pretending it is there when reporting CPU model data on query-cpu-* QMP commands (making every CPU model to be reported as not runnable), we can disable it by default on all CPU models when using TCG. Do that by adding a tcg_default_props array that will work like kvm_default_props. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17target-i386: List CPU models using subclass listEduardo Habkost
Instead of using the builtin_x86_defs array, use the QOM subclass list to list CPU models on "-cpu ?" and "query-cpu-definitions". Signed-off-by: Andreas Färber <afaerber@suse.de> [ehabkost: copied code from a patch by Andreas: "target-i386: QOM'ify CPU", from March 2012] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17tests: Add test case for x86 feature parsing compatibilityEduardo Habkost
Add a new test case to ensure the existing behavior of the feature parsing code will be kept. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into stagingPeter Maydell
This pull request contains: - a patch to add a vdc->reset() handler to virtio-9p - a bunch of patches to fix various memory leaks (thanks to Li Qiang) - some code cleanups for 9pfs # gpg: Signature made Mon 17 Oct 2016 16:01:46 BST # gpg: using DSA key 0x02FC3AEB0101DBC2 # gpg: Good signature from "Greg Kurz <groug@kaod.org>" # gpg: aka "Greg Kurz <groug@free.fr>" # gpg: aka "Greg Kurz <gkurz@fr.ibm.com>" # gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>" # gpg: aka "Gregory Kurz (Groug) <groug@free.fr>" # gpg: aka "Gregory Kurz (Cimai Technology) <gkurz@cimai.com>" # gpg: aka "Gregory Kurz (Meiosys Technology) <gkurz@meiosys.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2 * remotes/gkurz/tags/for-upstream: 9pfs: fix memory leak in v9fs_write 9pfs: fix memory leak in v9fs_link 9pfs: fix memory leak in v9fs_xattrcreate 9pfs: fix information leak in xattr read virtio-9p: add reset handler 9pfs: only free completed request if not flushed 9pfs: drop useless check in pdu_free() 9pfs: use coroutine_fn annotation in hw/9pfs/9p.[ch] 9pfs: use coroutine_fn annotation in hw/9pfs/co*.[ch] 9pfs: fsdev: drop useless extern annotation for functions 9pfs: fix potential host memory leak in v9fs_read 9pfs: allocate space for guest originated empty strings Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-179pfs: fix memory leak in v9fs_writeLi Qiang
If an error occurs when marshalling the transfer length to the guest, the v9fs_write() function doesn't free an IO vector, thus leading to a memory leak. This patch fixes the issue. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, rephrased the changelog] Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fix memory leak in v9fs_linkLi Qiang
The v9fs_link() function keeps a reference on the source fid object. This causes a memory leak since the reference never goes down to 0. This patch fixes the issue. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, rephrased the changelog] Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fix memory leak in v9fs_xattrcreateLi Qiang
The 'fs.xattr.value' field in V9fsFidState object doesn't consider the situation that this field has been allocated previously. Every time, it will be allocated directly. This leads to a host memory leak issue if the client sends another Txattrcreate message with the same fid number before the fid from the previous time got clunked. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> [groug, updated the changelog to indicate how the leak can occur] Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fix information leak in xattr readLi Qiang
9pfs uses g_malloc() to allocate the xattr memory space, if the guest reads this memory before writing to it, this will leak host heap memory to the guest. This patch avoid this. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17virtio-9p: add reset handlerGreg Kurz
Virtio devices should implement the VirtIODevice->reset() function to perform necessary cleanup actions and to bring the device to a quiescent state. In the case of the virtio-9p device, this means: - emptying the list of active PDUs (i.e. draining all in-flight I/O) - freeing all fids (i.e. close open file descriptors and free memory) That's what this patch does. The reset handler first waits for all active PDUs to complete. Since completion happens in the QEMU global aio context, we just have to loop around aio_poll() until the active list is empty. The freeing part involves some actions to be performed on the backend, like closing file descriptors or flushing extended attributes to the underlying filesystem. The virtfs_reset() function already does the job: it calls free_fid() for all open fids not involved in an ongoing I/O operation. We are sure this is the case since we have drained the PDU active list. The current code implements all backend accesses with coroutines, but we want to stay synchronous on the reset path. We can either change the current code to be able to run when not in coroutine context, or create a coroutine context and wait for virtfs_reset() to complete. This patch goes for the latter because it results in simpler code. Note that we also need to create a dummy PDU because it is also an API to pass the FsContext pointer to all backend callbacks. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-10-179pfs: only free completed request if not flushedGreg Kurz
If a PDU has a flush request pending, the current code calls pdu_free() twice: 1) pdu_complete()->pdu_free() with pdu->cancelled set, which does nothing 2) v9fs_flush()->pdu_free() with pdu->cancelled cleared, which moves the PDU back to the free list. This works but it complexifies the logic of pdu_free(). With this patch, pdu_complete() only calls pdu_free() if no flush request is pending, i.e. qemu_co_queue_next() returns false. Since pdu_free() is now supposed to be called with pdu->cancelled cleared, the check in pdu_free() is dropped and replaced by an assertion. Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: drop useless check in pdu_free()Greg Kurz
Out of the three users of pdu_free(), none ever passes a NULL pointer to this function. Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: use coroutine_fn annotation in hw/9pfs/9p.[ch]Greg Kurz
All these functions either call the v9fs_co_* functions which have the coroutine_fn annotation, or pdu_complete() which calls qemu_co_queue_next(). Let's mark them to make it obvious they execute in coroutine context. Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: use coroutine_fn annotation in hw/9pfs/co*.[ch]Greg Kurz
All these functions use the v9fs_co_run_in_worker() macro, and thus always call qemu_coroutine_self() and qemu_coroutine_yield(). Let's mark them to make it obvious they execute in coroutine context. Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fsdev: drop useless extern annotation for functionsGreg Kurz
Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: fix potential host memory leak in v9fs_readLi Qiang
In 9pfs read dispatch function, it doesn't free two QEMUIOVector object thus causing potential memory leak. This patch avoid this. Signed-off-by: Li Qiang <liqiang6-s@360.cn> Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-179pfs: allocate space for guest originated empty stringsLi Qiang
If a guest sends an empty string paramater to any 9P operation, the current code unmarshals it into a V9fsString equal to { .size = 0, .data = NULL }. This is unfortunate because it can cause NULL pointer dereference to happen at various locations in the 9pfs code. And we don't want to check str->data everywhere we pass it to strcmp() or any other function which expects a dereferenceable pointer. This patch enforces the allocation of genuine C empty strings instead, so callers don't have to bother. Out of all v9fs_iov_vunmarshal() users, only v9fs_xattrwalk() checks if the returned string is empty. It now uses v9fs_string_size() since name.data cannot be NULL anymore. Signed-off-by: Li Qiang <liqiang6-s@360.cn> [groug, rewritten title and changelog, fix empty string check in v9fs_xattrwalk()] Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.8-20161017' ↵Peter Maydell
into staging ppc patch queue 2016-10-17 Highlights: * Significant rework of how PCI IO windows are placed for the pseries machine type * A number of extra tests added for ppc * Other tests clean up / fixed * Some cleanups to the XICS interrupt controller in preparation for the 'powernv' machine type A number of the test changes aren't strictly in ppc related code, but are included via my tree because they're primarily focused on improving test coverage for ppc. # gpg: Signature made Mon 17 Oct 2016 03:42:41 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.8-20161017: spapr: Improved placement of PCI host bridges in guest memory map spapr_pci: Add a 64-bit MMIO window spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM spapr_pci: Delegate placement of PCI host bridges to machine type libqos: Limit spapr-pci to 32-bit MMIO for now libqos: Correct error in PCI hole sizing for spapr libqos: Isolate knowledge of spapr memory map to qpci_init_spapr() ppc/xics: Split ICS into ics-base and ics class ppc/xics: Make the ICSState a list spapr: fix inheritance chain for default machine options target-ppc: implement vexts[bh]2w and vexts[bhw]2d tests/boot-sector: Increase time-out to 90 seconds tests/boot-sector: Use mkstemp() to create a unique file name tests/boot-sector: Use minimum length for the Forth boot script qtest: ask endianness of the target in qtest_init() tests: minor cleanups in usb-hcd-uhci-test Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-17Merge remote-tracking branch 'remotes/famz/tags/for-upstream' into stagingPeter Maydell
# gpg: Signature made Mon 17 Oct 2016 03:08:28 BST # gpg: using RSA key 0xCA35624C6A9171C6 # gpg: Good signature from "Fam Zheng <famz@redhat.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6 * remotes/famz/tags/for-upstream: tests/docker/Makefile.include: add a generic docker-run target tests/docker: make test-mingw honour TARGET_LIST tests/docker: test-build script tests/docker: add travis dockerfile Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-17Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20161014' ↵Peter Maydell
into staging migration/next for 20161014 # gpg: Signature made Fri 14 Oct 2016 16:24:13 BST # gpg: using RSA key 0xF487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20161014: docs/xbzrle: correction migrate: move max-bandwidth and downtime-limit to migrate_set_parameter migration: Fix seg with missing port migration/postcopy: Explicitly disallow huge pages RAMBlocks: Store page size Postcopy vs xbzrle: Don't send xbzrle pages once in postcopy [for 2.8] migrate: Fix bounds check for migration parameters in migration.c migrate: Use boxed qapi for migrate-set-parameters migrate: Share common MigrationParameters struct migrate: Fix cpu-throttle-increment regression in HMP migration/rdma: Don't flag an error when we've been told about one migration: Make failed migration load set file error migration/rdma: Pass qemu_file errors across link migration: Report values for comparisons migration: report an error giving the failed field Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-17tests/docker/Makefile.include: add a generic docker-run targetAlex Bennée
This re-factors the docker makefile to include a docker-run target which can be controlled entirely from environment variables specified on the make command line. This allows us to run against any given docker image we may have in our repository, for example: make docker-run TEST="test-quick" IMAGE="debian:arm64" \ EXECUTABLE=./aarch64-linux-user/qemu-aarch64 The existing docker-foo@bar targets still work but the inline verification has been dropped because we already don't hit that due to other pattern rules in rules.mak. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20161011161625.9070-5-alex.bennee@linaro.org> Message-Id: <20161011161625.9070-6-alex.bennee@linaro.org> [Squash in the verification removal patch. - Fam] Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-17tests/docker: make test-mingw honour TARGET_LISTAlex Bennée
The other builders honour this variable, so should the mingw build. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20161011161625.9070-4-alex.bennee@linaro.org> Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-17tests/docker: test-build scriptAlex Bennée
Much like test-quick but only builds. This is useful for some of the build targets like ThreadSanitizer that don't yet pass "make check". Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20161011161625.9070-3-alex.bennee@linaro.org> Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-17tests/docker: add travis dockerfileAlex Bennée
This target grabs the latest Travis containers from their repository at quay.io and then installs QEMU's build dependencies. With this it is possible to run on broadly the same setup as they have on travis-ci.org. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20161011161625.9070-2-alex.bennee@linaro.org> Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-16spapr: Improved placement of PCI host bridges in guest memory mapDavid Gibson
Currently, the MMIO space for accessing PCI on pseries guests begins at 1 TiB in guest address space. Each PCI host bridge (PHB) has a 64 GiB chunk of address space in which it places its outbound PIO and 32-bit and 64-bit MMIO windows. This scheme as several problems: - It limits guest RAM to 1 TiB (though we have a limited fix for this now) - It limits the total MMIO window to 64 GiB. This is not always enough for some of the large nVidia GPGPU cards - Putting all the windows into a single 64 GiB area means that naturally aligning things within there will waste more address space. In addition there was a miscalculation in some of the defaults, which meant that the MMIO windows for each PHB actually slightly overran the 64 GiB region for that PHB. We got away without nasty consequences because the overrun fit within an unused area at the beginning of the next PHB's region, but it's not pretty. This patch implements a new scheme which addresses those problems, and is also closer to what bare metal hardware and pHyp guests generally use. Because some guest versions (including most current distro kernels) can't access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and 64 TiB. This is broken into 1 TiB chunks. The first 1 TiB contains the PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs. Each subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for one PHB each. This reduces the number of allowed PHBs (without full manual configuration of all the windows) from 256 to 31, but this should still be plenty in practice. We also change some of the default window sizes for manually configured PHBs to saner values. Finally we adjust some tests and libqos so that it correctly uses the new default locations. Ideally it would parse the device tree given to the guest, but that's a more complex problem for another time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16spapr_pci: Add a 64-bit MMIO windowDavid Gibson
On real hardware, and under pHyp, the PCI host bridges on Power machines typically advertise two outbound MMIO windows from the guest's physical memory space to PCI memory space: - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space - A 64-bit window which maps onto a large region somewhere high in PCI address space (traditionally this used an identity mapping from guest physical address to PCI address, but that's not always the case) The qemu implementation in spapr-pci-host-bridge, however, only supports a single outbound MMIO window, however. At least some Linux versions expect the two windows however, so we arranged this window to map onto the PCI memory space from 2 GiB..~64 GiB, then advertised it as two contiguous windows, the "32-bit" window from 2G..4G and the "64-bit" window from 4G..~64G. This approach means, however, that the 64G window is not naturally aligned. In turn this limits the size of the largest BAR we can map (which does have to be naturally aligned) to roughly half of the total window. With some large nVidia GPGPU cards which have huge memory BARs, this is starting to be a problem. This patch adds true support for separate 32-bit and 64-bit outbound MMIO windows to the spapr-pci-host-bridge implementation, each of which can be independently configured. The 32-bit window always maps to 2G.. in PCI space, but the PCI address of the 64-bit window can be configured (it defaults to the same as the guest physical address). So as not to break possible existing configurations, as long as a 64-bit window is not specified, a large single window can be specified. This will appear the same way to the guest as the old approach, although it's now implemented by two contiguous memory regions rather than a single one. For now, this only adds the possibility of 64-bit windows. The default configuration still uses the legacy mode. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16spapr: Adjust placement of PCI host bridge to allow > 1TiB RAMDavid Gibson
Currently the default PCI host bridge for the 'pseries' machine type is constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in guest memory space. This means that if > 1TiB of guest RAM is specified, the RAM will collide with the PCI IO windows, causing serious problems. Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because there's a little unused space at the bottom of the area reserved for PCI, but essentially this means that > 1TiB of RAM has never worked with the pseries machine type. This patch fixes this by altering the placement of PHBs on large-RAM VMs. Instead of always placing the first PHB at 1TiB, it is placed at the next 1 TiB boundary after the maximum RAM address. Technically, this changes behaviour in a migration-breaking way for existing machines with > 1TiB maximum memory, but since having > 1 TiB memory was broken anyway, this seems like a reasonable trade-off. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16spapr_pci: Delegate placement of PCI host bridges to machine typeDavid Gibson
The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB) for a PAPR guest. Unlike on x86, it's routine on Power (both bare metal and PAPR guests) to have numerous independent PHBs, each controlling a separate PCI domain. There are two ways of configuring the spapr-pci-host-bridge device: first it can be done fully manually, specifying the locations and sizes of all the IO windows. This gives the most control, but is very awkward with 6 mandatory parameters. Alternatively just an "index" can be specified which essentially selects from an array of predefined PHB locations. The PHB at index 0 is automatically created as the default PHB. The current set of default locations causes some problems for guests with large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia GPGPU cards via VFIO). Obviously, for migration we can only change the locations on a new machine type, however. This is awkward, because the placement is currently decided within the spapr-pci-host-bridge code, so it breaks abstraction to look inside the machine type version. So, this patch delegates the "default mode" PHB placement from the spapr-pci-host-bridge device back to the machine type via a public method in sPAPRMachineClass. It's still a bit ugly, but it's about the best we can do. For now, this just changes where the calculation is done. It doesn't change the actual location of the host bridges, or any other behaviour. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16libqos: Limit spapr-pci to 32-bit MMIO for nowDavid Gibson
Currently the functions in pci-spapr.c (like pci-pc.c on which it's based) don't distinguish between 32-bit and 64-bit PCI MMIO. At the moment, the qemu side implementation is a bit weird and has a single MMIO window straddling 32-bit and 64-bit regions, but we're likely to change that in future. In any case, pci-pc.c - and therefore the testcases using PCI - only handle 32-bit MMIOs for now. For spapr despite whatever changes might happen with the MMIO windows, the 32-bit window is likely to remain at 2..4 GiB in PCI space. So, explicitly limit pci-spapr.c to 32-bit MMIOs for now, we can add 64-bit MMIO support back in when and if we need it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16libqos: Correct error in PCI hole sizing for spaprDavid Gibson
In pci-spapr.c (as in pci-pc.c from which it was derived), the pci_hole_start/pci_hole_size and pci_iohole_start/pci_iohole_size pairs[1] essentially define the region of PCI (not CPU) addresses in which MMIO or PIO BARs respectively will be allocated. The size value is relative to the start value. But in pci-spapr.c it is set to the entire size of the window supported by the (emulated) hardware, but the start values are *not* at the beginning of the emulated windows. That means if you tried to map enough PCI BARs, we'd messily overrun the IO windows, instead of failing in iomap as we should. This patch corrects this by calculating the hole sizes from the location of the window in PCI space and the hole start. [1] Those are bad names, but that's a problem for another time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16libqos: Isolate knowledge of spapr memory map to qpci_init_spapr()David Gibson
The libqos code for accessing PCI on the spapr machine type uses IOBASE() and MMIOBASE() macros to determine the address in the CPU memory map of the windows to PCI address space. This is a detail of the implementation of PCI in the machine type, it's not specified by the PAPR standard. Real guests would get the addresses of the PCI windows from the device tree. Finding the device tree in libqos would be awkward, but we can at least localize this knowledge of the implementation to the init function, saving it in the QPCIBusSPAPR structure for use by the accessors. That leaves only one place to fix if we alter the location of the PCI windows, as we're planning to do. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-14ppc/xics: Split ICS into ics-base and ics classBenjamin Herrenschmidt
The existing implementation remains same and ics-base is introduced. The type name "ics" is retained, and all the related functions renamed as ics_simple_* This will allow different implementations for the source controllers such as the MSI support of PHB3 on Power8 which uses in-memory state tables for example. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ clg: added ICS_BASE_GET_CLASS and related fixes, based on : http://patchwork.ozlabs.org/patch/646010/ ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14ppc/xics: Make the ICSState a listBenjamin Herrenschmidt
Instead of an array of fixed sized blocks, use a list, as we will need to have sources with variable number of interrupts. SPAPR only uses a single entry. Native will create more. If performance becomes an issue we can add some hashed lookup but for now this will do fine. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [ move the initialization of list to xics_common_initfn, restore xirr_owner after migration and move restoring to icp_post_load] Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> [ clg: removed the icp_post_load() changes from nikunj patchset v3: http://patchwork.ozlabs.org/patch/646008/ ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14spapr: fix inheritance chain for default machine optionsMichael Roth
Rather than machine instances having backward-compatible option defaults that need to be repeatedly re-enabled for every new machine type we introduce, we set the defaults appropriate for newer machine types, then add code to explicitly disable instance options as needed to maintain compatibility with older machine types. Currently pseries-2.5 does not inherit from pseries-2.6 in this fashion, which is okay at the moment since we do not have any instance compatibility options for pseries-2.6+ currently. We will make use of this in future patches though, so fix it here. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> [dwg: Extended to make 2.7 inherit from 2.8 as well] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14target-ppc: implement vexts[bh]2w and vexts[bhw]2dNikunj A Dadhania
Vector Extend Sign Instructions: vextsb2w: Vector Extend Sign Byte To Word vextsh2w: Vector Extend Sign Halfword To Word vextsb2d: Vector Extend Sign Byte To Doubleword vextsh2d: Vector Extend Sign Halfword To Doubleword vextsw2d: Vector Extend Sign Word To Doubleword Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14tests/boot-sector: Increase time-out to 90 secondsThomas Huth
Since the PXE tester runs rather slow on ppc64 with tcg, there is a chance that we hit the 60 seconds timeout on machines that have a heavy CPU load. So let's increase the timeout to ease the situation. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14tests/boot-sector: Use mkstemp() to create a unique file nameThomas Huth
The pxe-test is run for three different targets now (x86_64, i386 and ppc64), and the bios-tables-test is run for two targets (x86_64 and i386). But each of the tests is using an invariant name for the disk image with the boot sector code - so if the tests are running in parallel, there is a race condition that they destroy the disk image of a parallel test program. Let's use mkstemp() to create unique temporary files here instead - and since mkstemp() is returning an integer file descriptor instead of a FILE pointer, we also switch the fwrite() and fclose() to write() and close() instead. Reported-by: Sascha Silbe <x-qemu@se-silbe.de> Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14tests/boot-sector: Use minimum length for the Forth boot scriptThomas Huth
The pxe-test is quite slow on ppc64 with tcg. We can speed it up a little bit by decreasing the size of the file that has to be loaded via TFTP. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14qtest: ask endianness of the target in qtest_init()Laurent Vivier
The target endianness is not deduced anymore from the architecture name but asked directly to the guest, using a new qtest command: "endianness". As it can't change (this is the value of TARGET_WORDS_BIGENDIAN), we store it to not have to ask every time we want to know if we have to byte-swap a value. Signed-off-by: Laurent Vivier <lvivier@redhat.com> CC: Greg Kurz <groug@kaod.org> CC: David Gibson <david@gibson.dropbear.id.au> CC: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>