aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)Author
2017-05-25vhost-user: pass message as a pointer to process_message_reply()Maxime Coquelin
process_message_reply() was recently updated to get full message content instead of only its request field. There is no need to copy all the struct content into the stack, so just pass its pointer as const. Reviewed-by: Jens Freimann <jfreiman@redhat.com> Reviewed-by: Zhiyong Yang <zhiyong.yang@intel.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-05-25virtio_net: Bypass backends for MTU feature negotiationMaxime Coquelin
This patch adds a new internal "x-mtu-bypass-backend" property to bypass backends for MTU feature negotiation. When this property is set, the MTU feature is negotiated as soon as supported by the guest and a MTU value is set via the host_mtu parameter. In case the backend advertises the feature (e.g. DPDK's vhost-user backend), the feature negotiation is propagated down to the backend. When this property is not set, the backend has to support the MTU feature for its negotiation to succeed. For compatibility purpose, this property is disabled for machine types v2.9 and older. Cc: Aaron Conole <aconole@redhat.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-25intel_iommu: support passthrough (PT)Peter Xu
Hardware support for VT-d device passthrough. Although current Linux can live with iommu=pt even without this, but this is faster than when using software passthrough. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Liu, Yi L <yi.l.liu@linux.intel.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2017-05-25intel_iommu: allow dev-iotlb context entry conditionallyPeter Xu
When device-iotlb is not specified, we should fail this check. A new function vtd_ce_type_check() is introduced. While I'm at it, clean up the vtd_dev_to_context_entry() a bit - replace many "else if" usage into direct if check. That'll make the logic more clear. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2017-05-25intel_iommu: use IOMMU_ACCESS_FLAG()Peter Xu
We have that now, so why not use it. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2017-05-25intel_iommu: provide vtd_ce_get_type()Peter Xu
Helper to fetch VT-d context entry type. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2017-05-25intel_iommu: renaming context entry helpersPeter Xu
The old names are too long and less ordered. Let's start to use vtd_ce_*() as a pattern. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2017-05-25x86-iommu: use DeviceClass propertiesPeter Xu
No reason to keep tens of lines if we can do it actually far shorter. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2017-05-25memory: remove the last param in memory_region_iommu_replay()Peter Xu
We were always passing in that one as "false" to assume that's an read operation, and we also assume that IOMMU translation would always have that read permission. A better permission would be IOMMU_NONE since the replay is after all not a real read operation, but just a page table rebuilding process. CC: David Gibson <david@gibson.dropbear.id.au> CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2017-05-25memory: tune last param of iommu_ops.translate()Peter Xu
This patch converts the old "is_write" bool into IOMMUAccessFlags. The difference is that "is_write" can only express either read/write, but sometimes what we really want is "none" here (neither read nor write). Replay is an good example - during replay, we should not check any RW permission bits since thats not an actual IO at all. CC: Paolo Bonzini <pbonzini@redhat.com> CC: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2017-05-259pfs: local: metadata file for the VirtFS rootGreg Kurz
When using the mapped-file security, credentials are stored in a metadata directory located in the parent directory. This is okay for all paths with the notable exception of the root path, since we don't want and probably can't create a metadata directory above the virtfs directory on the host. This patch introduces a dedicated metadata file, sitting in the virtfs root for this purpose. It relies on the fact that the "." name necessarily refers to the virtfs root. As for the metadata directory, we don't want the client to see this file. The current code only cares for readdir() but there are many other places to fix actually. The filtering logic is hence put in a separate function. Before: # ls -ld drwxr-xr-x. 3 greg greg 4096 May 5 12:49 . # chown root.root . chown: changing ownership of '.': Is a directory # ls -ld drwxr-xr-x. 3 greg greg 4096 May 5 12:49 . After: # ls -ld drwxr-xr-x. 3 greg greg 4096 May 5 12:49 . # chown root.root . # ls -ld drwxr-xr-x. 3 root root 4096 May 5 12:50 . and from the host: ls -al .virtfs_metadata_root -rwx------. 1 greg greg 26 May 5 12:50 .virtfs_metadata_root $ cat .virtfs_metadata_root virtfs.uid=0 virtfs.gid=0 Reported-by: Leo Gaspard <leo@gaspard.io> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Leo Gaspard <leo@gaspard.io> [groug: work around a patchew false positive in local_set_mapped_file_attrat()]
2017-05-259pfs: local: simplify file openingGreg Kurz
The logic to open a path currently sits between local_open_nofollow() and the relative_openat_nofollow() helper, which has no other user. For the sake of clarity, this patch moves all the code of the helper into its unique caller. While here we also: - drop the code to skip leading "/" because the backend isn't supposed to pass anything but relative paths without consecutive slashes. The assert() is kept because we really don't want a buggy backend to pass an absolute path to openat(). - use strchrnul() to get a simpler code. This is ok since virtfs is for linux+glibc hosts only. - don't dup() the initial directory and add an assert() to ensure we don't return the global mountfd to the caller. BTW, this would mean that the caller passed an empty path, which isn't supposed to happen either. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com> [groug: fixed typos in changelog]
2017-05-259pfs: local: resolve special directories in pathsGreg Kurz
When using the mapped-file security mode, the creds of a path /foo/bar are stored in the /foo/.virtfs_metadata/bar file. This is okay for all paths unless they end with '.' or '..', because we cannot create the corresponding file in the metadata directory. This patch ensures that '.' and '..' are resolved in all paths. The core code only passes path elements (no '/') to the backend, with the notable exception of the '/' path, which refers to the virtfs root. This patch preserves the current behavior of converting it to '.' so that it can be passed to "*at()" syscalls ('/' would mean the host root). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-259pfs: check return value of v9fs_co_name_to_path()Greg Kurz
These v9fs_co_name_to_path() call sites have always been around. I guess no care was taken to check the return value because the name_to_path operation could never fail at the time. This is no longer true: the handle and synth backends can already fail this operation, and so will the local backend soon. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-259pfs: assume utimensat() and futimens() are presentGreg Kurz
The utimensat() and futimens() syscalls have been around for ages (ie, glibc 2.6 and linux 2.6.22), and the decision was already taken to switch to utimensat() anyway when fixing CVE-2016-9602 in 2.9. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-259pfs: local: fix unlink of alien files in mapped-file modeGreg Kurz
When trying to remove a file from a directory, both created in non-mapped mode, the file remains and EBADF is returned to the guest. This is a regression introduced by commit "df4938a6651b 9pfs: local: unlinkat: don't follow symlinks" when fixing CVE-2016-9602. It changed the way we unlink the metadata file from ret = remove("$dir/.virtfs_metadata/$name"); if (ret < 0 && errno != ENOENT) { /* Error out */ } /* Ignore absence of metadata */ to fd = openat("$dir/.virtfs_metadata") unlinkat(fd, "$name") if (ret < 0 && errno != ENOENT) { /* Error out */ } /* Ignore absence of metadata */ If $dir was created in non-mapped mode, openat() fails with ENOENT and we pass -1 to unlinkat(), which fails in turn with EBADF. We just need to check the return of openat() and ignore ENOENT, in order to restore the behaviour we had with remove(). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com> [groug: rewrote the comments as suggested by Eric]
2017-05-259pfs: drop pdu_push_and_notify()Greg Kurz
Only pdu_complete() needs to notify the client that a request has completed. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2017-05-25virtio-9p/xen-9p: move 9p specific bits to core 9p codeGreg Kurz
These bits aren't related to the transport so let's move them to the core code. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2017-05-25xics: add unrealize handlerGreg Kurz
Now that ICPState objects get finalized on CPU unplug, we should unregister reset handlers as well to avoid a QEMU crash at machine reset time. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25hw/ppc/spapr.c: recover pending LMB unplug info in spapr_lmb_releaseDaniel Henrique Barboza
When a LMB hot unplug starts, the current DRC LMB status is stored at spapr->pending_dimm_unplugs QTAILQ. This queue isn't migrated, thus if a migration occurs in the middle of a LMB unplug the spapr_lmb_release callback will lost track of the LMB unplug progress. This patch implements a new recover function spapr_recover_pending_dimm_state that is used inside spapr_lmb_release to recover this DRC LMB release status that is lost during the migration. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> [dwg: Minor stylistic changes, simplify error handling] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25hw/ppc: migrating the DRC state of hotplugged devicesDaniel Henrique Barboza
In pseries, a firmware abstraction called Dynamic Reconfiguration Connector (DRC) is used to assign a particular dynamic resource to the guest and provide an interface to manage configuration/removal of the resource associated with it. In other words, DRC is the 'plugged state' of a device. Before this patch, DRC wasn't being migrated. This causes post-migration problems due to DRC state mismatch between source and target. The DRC state of a device X in the source might change, while in the target the DRC state of X is still fresh. When migrating the guest, X will not have the same hotplugged state as it did in the source. This means that we can't hot unplug X in the target after migration is completed because its DRC state is not consistent. https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1677552 is one bug that is caused by this DRC state mismatch between source and target. To migrate the DRC state, we defined the VMStateDescription struct for spapr_drc to enable the transmission of spapr_drc state in migration. Not all the elements in the DRC state are migrated - only those that can be modified by guest actions or device add/remove operations: - 'isolation_state', 'allocation_state' and 'indicator_state' are involved in the DR state transition diagram from PAPR+ 2.7, 13.4; - 'configured', 'signalled', 'awaiting_release' and 'awaiting_allocation' are needed in attaching and detaching devices; - 'indicator_state' provides users with hardware state information. These are the DRC elements that are migrated. In this patch the DRC state is migrated for PCI, LMB and CPU connector types. At this moment there is no support to migrate DRC for the PHB (PCI Host Bridge) type. In the 'realize' function the DRC is registered using vmstate_register, similar to what hw/ppc/spapr_iommu.c does in 'spapr_tce_table_realize'. This approach works because DRCs are bus-less and do not sit on a BusClass that implements bc->get_dev_path, so as a fallback the VMSD gets identified via "spapr_drc"/get_index(drc). Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25hw/ppc: removing drc->detach_cb and drc->detach_cb_opaqueDaniel Henrique Barboza
The pointer drc->detach_cb is being used as a way of informing the detach() function inside spapr_drc.c which cb to execute. This information can also be retrieved simply by checking drc->type and choosing the right callback based on it. In this context, detach_cb is redundant information that must be managed. After the previous spapr_lmb_release change, no detach_cb_opaques are being used by any of the three callbacks functions. This is yet another information that is now unused and, on top of that, can't be migrated either. This patch makes the following changes: - removal of detach_cb_opaque. the 'opaque' argument was removed from the callbacks and from the detach() function of sPAPRConnectorClass. The attribute detach_cb_opaque of sPAPRConnector was removed. - removal of detach_cb from the detach() call. The function pointer detach_cb of sPAPRConnector was removed. detach() now uses a switch(drc->type) to execute the apropriate callback. To achieve this, spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb callbacks were made public to be visible inside detach(). Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineStateDavid Gibson
The LMB DRC release callback, spapr_lmb_release(), uses an opaque parameter, a sPAPRDIMMState struct that stores the current LMBs that are allocated to a DIMM (nr_lmbs). After each call to this callback, the nr_lmbs is decremented by one and, when it reaches zero, the callback proceeds with the qdev calls to hot unplug the LMB. Using drc->detach_cb_opaque is problematic because it can't be migrated in the future DRC migration work. This patch makes the following changes to eliminate the usage of this opaque callback inside spapr_lmb_release: - sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new attribute called 'addr' was added to it. This is used as an unique identifier to associate a sPAPRDIMMState to a PCDIMM element. - sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'. This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs that are currently going under an unplug process. - spapr_lmb_release() will now retrieve the nr_lmbs value by getting the correspondent sPAPRDIMMState. A helper function called spapr_dimm_get_address was created to fetch the address of a PCDIMM device inside spapr_lmb_release. When nr_lmbs reaches zero and the callback proceeds with the qdev hot unplug calls, the sPAPRDIMMState struct is removed from spapr->pending_dimm_unplugs. After these changes, the opaque argument for spapr_lmb_release is now unused and is passed as NULL inside spapr_del_lmbs. This and the other opaque arguments can now be safely removed from the code. As an additional cleanup made by this patch, the spapr_del_lmbs function was merged with spapr_memory_unplug_request. The former was being called only by the latter and both were small enough to fit one single function. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> [dwg: Minor stylistic cleanups] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24Merge remote-tracking branch 'cohuck/tags/s390x-20170523' into stagingStefan Hajnoczi
s390x updates: - support for vfio-ccw to passthrough channel devices - allow ccw bios to boot from scsi generic devices - bugfix for initial reset # gpg: Signature made Tue 23 May 2017 12:02:24 PM BST # gpg: using RSA key 0xDECF6B93C6F02FAF # gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>" # gpg: aka "Cornelia Huck <cohuck@kernel.org>" # gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>" # gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>" # Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF * cohuck/tags/s390x-20170523: (21 commits) s390/kvm: do not reset riccb on initial cpu reset MAINTAINERS: Add vfio-ccw maintainer vfio/ccw: update sense data if a unit check is pending s390x/css: ccw translation infrastructure s390x/css: introduce and realize ccw-request callback vfio/ccw: get irqs info and set the eventfd fd vfio/ccw: get io region info vfio/ccw: vfio based subchannel passthrough driver s390x/css: device support for s390-ccw passthrough s390x/css: realize css_create_sch s390x/css: realize css_sch_build_schib s390x/css: add s390-squash-mcss machine option linux-headers: update pc-bios/s390-ccw.img: rebuild image pc-bios/s390-ccw: Build a reasonable max_sectors limit pc-bios/s390-ccw: Get Block Limits VPD device data pc-bios/s390-ccw: Get list of supported VPD pages pc-bios/s390-ccw: Refactor scsi_inquiry function pc-bios/s390-ccw: Break up virtio-scsi read into multiples pc-bios/s390-ccw: Move SCSI block factor to outer read ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-24spapr: add pre_plug function for memoryLaurent Vivier
This allows to manage errors before the memory has started to be hotplugged. We already have the function for the CPU cores. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> [dwg: Fixed a couple of style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24pseries: Restore support for total vcpus not a multiple of threads-per-core ↵David Gibson
for old machine types As of pseries-2.7 and later, we require the total number of guest vcpus to be a multiple of the threads-per-core. pseries-2.6 and earlier machine types, however, are supposed to allow this for the sake of migration from old qemu versions which allowed this. Unfortunately, 8149e29 "pseries: Enforce homogeneous threads-per-core" broke this by not considering the old machine type case. This fixes it by only applying the check when the machine type supports hotpluggable cpus. By not-entirely-coincidence, that corresponds to the same time when we started enforcing total threads being a multiple of threads-per-core. Fixes: 8149e2992f7811355cc34721b79d69d1a3a667dd Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org>
2017-05-24pseries: Split CAS PVR negotiation out into a separate functionDavid Gibson
Guests of the qemu machine type go through a feature negotiation process known as "client architecture support" (CAS) during early boot. This does a number of things, one of which is finding a CPU compatibility mode which can be supported by both guest and host. In fact the CPU negotiation is probably the single most complex part of the CAS process, so this splits it out into a helper function. We've recently made some mistakes in maintaining backward compatibility for old machine types here. Splitting this out will also make it easier to fix this. This also adds a possibly useful error message if the negotiation fails (i.e. if there isn't a CPU mode that's suitable for both guest and host). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>
2017-05-24spapr: fix error reporting in xics_system_init()Greg Kurz
If the user explicitely asked for kernel-irqchip support and "xics-kvm" initialization fails, we shouldn't fallback to emulated "xics" as we do now. It is also awkward to print an error message when we have an errp pointer argument. Let's use the errp argument to report the error and let the caller decide. This simplifies the code as we don't need a local Error * here. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24spapr_cpu_core: drop reference on ICP object during CPU realizationGreg Kurz
When a piece of code allocates an object, it implicitely gets a reference on it. If it then makes that object a child property of another object, it should drop its own reference at some point otherwise the child object can never be finalized. The current code hence leaks one ICP object per CPU when hot-removing a core. Failing to add a newly allocated ICP object to the CPU is a bug. While here, let's ensure QEMU aborts if this ever happens. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24hw/ppc/spapr_events.c: removing 'exception' from sPAPREventLogEntryDaniel Henrique Barboza
Currenty we do not have any RTAS event that is reported by the event-scan interface. The existing events, RTAS_LOG_TYPE_EPOW and RTAS_LOG_TYPE_HOTPLUG, are being reported by the check-exception interface and, as such, marked as 'exception=true'. Commit 79853e18d9, 'spapr_events: event-scan RTAS interface', added the event_scan interface because the guest kernel requires it to initialize other required interfaces. It is acting since then as a stub because no events that would be reported by it were added since then. However, the existence of the 'exception' boolean adds an unnecessary load in the future migration of the pending_events, sPAPREventLogEntry QTAILQ that hosts the pending RTAS events. To make the code cleaner and ease the future migration changes, this patch makes the following changes: - remove the 'exception' boolean that filter these events. There is nothing to filter since all events are reported by check-exception; - functions rtas_event_log_queue, rtas_event_log_dequeue and rtas_event_log_contains don't receive the 'exception' boolean as parameter; - event_scan function was simplified. It was calling 'rtas_event_log_dequeue(mask, false)' that was always returning 'NULL' because we have no events that are created with exception=false, thus in the end it would execute a jump to 'out_no_events' all the time. The function now assumes that this will always be the case and all the remaining logic were deleted. In the future, when or if we add new RTAS events that should be reported with the event_scan interface, we can refer to the changes made in this patch to add the event_scan logic back. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24spapr: ensure core_slot isn't NULL in spapr_core_unplug()Greg Kurz
If we go that far on the path of hot-removing a core and we find out that the core-id is invalid, then we have a serious bug. Let's make it explicit with an assert() instead of dereferencing a NULL pointer. This fixes Coverity issue CID 1375404. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24xics_kvm: cache already enabled vCPU idsGreg Kurz
Since commit a45863bda90d ("xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled"), we were able to re-hotplug a vCPU that had been hot- unplugged ealier, thanks to a boolean flag in ICPState that we set when enabling KVM_CAP_IRQ_XICS. This could work because the lifecycle of all ICPState objects was the same as the machine. Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under sPAPRCPUCore") broke this assumption and now we always pass a freshly allocated ICPState object (ie, with the flag unset) to icp_kvm_cpu_setup(). This cause re-hotplug to fail with: Unable to connect CPU8 to kernel XICS: Device or resource busy Let's fix this by caching all the vCPU ids for which KVM_CAP_IRQ_XICS was enabled. This also drops the now useless boolean flag from ICPState. Reported-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> Tested-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24spapr: Consolidate HPT freeing code into a routineBharata B Rao
Consolidate the code that frees HPT into a separate routine spapr_free_hpt() as the same chunk of code is called from two places. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24spapr-cpu-core: release ICP object when realization failsGreg Kurz
While here we introduce a single error path to avoid code duplication. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24spapr: sanitize error handling in spapr_ics_create()Greg Kurz
The spapr_ics_create() function handles errors in a rather convoluted way, with two local Error * variables. Moreover, failing to parent the ICS object to the machine should be considered as a bug but it is currently ignored. This patch addresses both issues. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24ppc/xics: simplify prototype of xics_spapr_init()Greg Kurz
This function only does hypercall and RTAS-call registration, and thus never returns an error. This patch adapt the prototype to reflect that. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-23Merge remote-tracking branch 'jasowang/tags/net-pull-request' into stagingStefan Hajnoczi
# gpg: Signature made Tue 23 May 2017 03:27:37 AM BST # gpg: using RSA key 0xEF04965B398D6211 # gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" # Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211 * jasowang/tags/net-pull-request: e1000e: Fix ICR "Other" causes clear logic net/filter-rewriter: Remove unused option in filter-rewriter net/filter-mirror.c: Rename filter_mirror_send() and fix codestyle net/filter-mirror.c: Remove duplicate check code. hmp / net: Mark host_net_add/remove as deprecated COLO-compare: Improve tcp compare trace event readability virtio-net: fix wild pointer when remove virtio-net queues net/dump: Issue a warning for the deprecated "-net dump" net/tap: Replace tap-haiku.c and tap-aix.c by a generic tap-stub.c Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-23shutdown: Add source information to SHUTDOWN and RESETEric Blake
Time to wire up all the call sites that request a shutdown or reset to use the enum added in the previous patch. It would have been less churn to keep the common case with no arguments as meaning guest-triggered, and only modified the host-triggered code paths, via a wrapper function, but then we'd still have to audit that I didn't miss any host-triggered spots; changing the signature forces us to double-check that I correctly categorized all callers. Since command line options can change whether a guest reset request causes an actual reset vs. a shutdown, it's easy to also add the information to reset requests. Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> [ppc parts] Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [SPARC part] Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x parts] Message-Id: <20170515214114.15442-5-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-05-23shutdown: Prepare for use of an enum in reset/shutdown_requestEric Blake
We want to track why a guest was shutdown; in particular, being able to tell the difference between a guest request (such as ACPI request) and host request (such as SIGINT) will prove useful to libvirt. Since all requests eventually end up changing shutdown_requested in vl.c, the logical change is to make that value track the reason, rather than its current 0/1 contents. Since command-line options control whether a reset request is turned into a shutdown request instead, the same treatment is given to reset_requested. This patch adds an internal enum ShutdownCause that describes reasons that a shutdown can be requested, and changes qemu_system_reset() to pass the reason through, although for now nothing is actually changed with regards to what gets reported. The enum could be exported via QAPI at a later date, if deemed necessary, but for now, there has not been a request to expose that much detail to end clients. For the most part, we turn 0 into SHUTDOWN_CAUSE_NONE, and 1 into SHUTDOWN_CAUSE_HOST_ERROR; the only specific case where we have enough information right now to use a different value is when we are reacting to a host signal. It will take a further patch to edit all call-sites that can trigger a reset or shutdown request to properly pass in any other reasons; this patch includes TODOs to point such places out. qemu_system_reset() trades its 'bool report' parameter for a 'ShutdownCause reason', with all non-zero values having the same effect; this lets us get rid of the weird #defines for VMRESET_* as synonyms for bools. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170515214114.15442-3-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-05-23e1000e: Fix ICR "Other" causes clear logicSameeh Jubran
This commit fixes a bug which causes the guest to hang. The bug was observed upon a "receive overrun" (bit #6 of the ICR register) interrupt which could be triggered post migration in a heavy traffic environment. Even though the "receive overrun" bit (#6) is masked out by the IMS register (refer to the log below) the driver still receives an interrupt as the "receive overrun" bit (#6) causes the "Other" - bit #24 of the ICR register - bit to be set as documented below. The driver handles the interrupt and clears the "Other" bit (#24) but doesn't clear the "receive overrun" bit (#6) which leads to an infinite loop. Apparently the Windows driver expects that the "receive overrun" bit and other ones - documented below - to be cleared when the "Other" bit (#24) is cleared. So to sum that up: 1. Bit #6 of the ICR register is set by heavy traffic 2. As a results of setting bit #6, bit #24 is set 3. The driver receives an interrupt for bit 24 (it doesn't receieve an interrupt for bit #6 as it is masked out by IMS) 4. The driver handles and clears the interrupt of bit #24 5. Bit #6 is still set. 6. 2 happens all over again The Interrupt Cause Read - ICR register: The ICR has the "Other" bit - bit #24 - that is set when one or more of the following ICR register's bits are set: LSC - bit #2, RXO - bit #6, MDAC - bit #9, SRPD - bit #16, ACK - bit #17, MNG - bit #18 This bug can occur with any of these bits depending on the driver's behaviour and the way it configures the device. However, trying to reproduce it with any bit other than RX0 is challenging and came to failure as the drivers don't implement most of these bits, trying to reproduce it with LSC (Link Status Change - bit #2) bit didn't succeed too as it seems that Windows handles this bit differently. Log sample of the storm: 27563@1494850819.411877:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 (ICR: 0x815000c2, IMS: 0x1a00004) 27563@1494850819.411900:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.411915:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412380:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412395:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412436:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412441:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412998:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 (ICR: 0x815000c2, IMS: 0x1a00004) * This bug behaviour wasn't observed with the Linux driver. This commit solves: https://bugzilla.redhat.com/show_bug.cgi?id=1447935 https://bugzilla.redhat.com/show_bug.cgi?id=1449490 Cc: qemu-stable@nongnu.org Signed-off-by: Sameeh Jubran <sjubran@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2017-05-23virtio-net: fix wild pointer when remove virtio-net queuesYunjian Wang
The tx_bh or tx_timer will free in virtio_net_del_queue() function, when removing virtio-net queues if the guest doesn't support multiqueue. But it might be still referenced by virtio_net_set_status(), which needs to be set NULL. And also the tx_waiting needs to be set zero to prevent virtio_net_set_status() accessing tx_bh or tx_timer. Cc: qemu-stable@nongnu.org Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2017-05-22numa: Silence incomplete mapping warning under qtestIgor Mammedov
Silence "make check" warnings triggered by the numa/mon/cpus/partial test case. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1495094971-177754-4-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-19Merge remote-tracking branch 'kraxel/tags/pull-audio-20170519-1' into stagingStefan Hajnoczi
audio: move & rename soundhw init code. # gpg: Signature made Fri 19 May 2017 12:22:51 PM BST # gpg: using RSA key 0x4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * kraxel/tags/pull-audio-20170519-1: audio: Rename hw/audio/audio.h to hw/audio/soundhw.h audio: Rename audio_init() to soundhw_init() audio: Move arch_init audio code to hw/audio/soundhw.c Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-19vfio/ccw: update sense data if a unit check is pendingDong Jia Shi
Concurrent-sense data is currently not delivered. This patch stores the concurrent-sense data to the subchannel if a unit check is pending and the concurrent-sense bit is enabled. Then a TSCH can retreive the right IRB data back to the guest. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170517004813.58227-13-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2017-05-19s390x/css: ccw translation infrastructureXiao Feng Ren
Implement a basic infrastructure of handling channel I/O instruction interception for passed through subchannels: 1. Branch the code path of instruction interception handling by SubChannel type. 2. For a passed-through subchannel, issue the ORB to kernel to do ccw translation and perform an I/O operation. 3. Assign different condition code based on the I/O result, or trigger a program check. Signed-off-by: Xiao Feng Ren <renxiaof@linux.vnet.ibm.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170517004813.58227-12-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2017-05-19s390x/css: introduce and realize ccw-request callbackXiao Feng Ren
Introduce a new callback on subchannel to handle ccw-request. Realize the callback in vfio-ccw device. Besides, resort to the event notifier handler to handling the ccw-request results. 1. Pread the I/O results via MMIO region. 2. Update the scsw info to guest. 3. Inject an I/O interrupt to notify guest the I/O result. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Xiao Feng Ren <renxiaof@linux.vnet.ibm.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170517004813.58227-11-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2017-05-19vfio/ccw: get irqs info and set the eventfd fdDong Jia Shi
vfio-ccw resorts to the eventfd mechanism to communicate with userspace. We fetch the irqs info via the ioctl VFIO_DEVICE_GET_IRQ_INFO, register a event notifier to get the eventfd fd which is sent to kernel via the ioctl VFIO_DEVICE_SET_IRQS, then we can implement read operation once kernel sends the signal. Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170517004813.58227-10-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2017-05-19vfio/ccw: get io region infoDong Jia Shi
vfio-ccw provides an MMIO region for I/O operations. We fetch its information via ioctls here, then we can use it performing I/O instructions and retrieving I/O results later on. Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170517004813.58227-9-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2017-05-19vfio/ccw: vfio based subchannel passthrough driverXiao Feng Ren
We use the IOMMU_TYPE1 of VFIO to realize the subchannels passthrough, implement a vfio based subchannels passthrough driver called "vfio-ccw". Support qemu parameters in the style of: "-device vfio-ccw,sysfsdev=$mdev_file_path,devno=xx.x.xxxx' Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Xiao Feng Ren <renxiaof@linux.vnet.ibm.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170517004813.58227-8-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2017-05-19s390x/css: device support for s390-ccw passthroughDong Jia Shi
In order to support subchannels pass-through, we introduce a s390 subchannel device called "s390-ccw" to hold the real subchannel info. The s390-ccw devices inherit from the abstract CcwDevice which connect to the existing virtual-css-bus. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-Id: <20170517004813.58227-7-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>