aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-03-13Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
pci, pc, virtio: features, fixes, cleanups intel-iommu scalable option pcie acs emulation beginning for vhost-user-blk reconnect and of vhost-user backend work misc fixes and cleanups Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Wed 13 Mar 2019 02:52:02 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (26 commits) i386, acpi: check acpi_memory_hotplug capacity in pre_plug gen_pcie_root_port: Add ACS (Access Control Services) capability pcie: Add a simple PCIe ACS (Access Control Services) helper function vhost-user-blk: Add support to get/set inflight buffer libvhost-user: Support tracking inflight I/O in shared memory libvhost-user: Introduce vu_queue_map_desc() libvhost-user: Remove unnecessary FD flag check for event file descriptors vhost-user: Support transferring inflight buffer between qemu and backend nvdimm: use NVDIMM_ACPI_IO_LEN for the proper IO size nvdimm: use *function* directly instead of allocating it again nvdimm: fix typo in nvdimm_build_nvdimm_devices argument intel_iommu: add scalable-mode option to make scalable mode work intel_iommu: add 256 bits qi_desc support intel_iommu: scalable mode emulation libvhost-user: add vu_queue_unpop() libvhost-user-glib: export vug_source_new() vhost-user: split vhost_user_read() vhost-user: wrap some read/write with retry handling libvhost-user: exit by default on VHOST_USER_NONE vhost-user: simplify vhost_user_init/vhost_user_cleanup ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-13Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into ↵Peter Maydell
staging Pull request # gpg: Signature made Tue 12 Mar 2019 20:23:08 GMT # gpg: using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full] # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/bitmaps-pull-request: (22 commits) tests/qemu-iotests: add bitmap resize test 246 block/qcow2-bitmap: Allow resizes with persistent bitmaps block/qcow2-bitmap: Don't check size for IN_USE bitmap docs/interop/qcow2: Improve bitmap flag in_use specification bitmaps: Fix typo in function name block/dirty-bitmaps: implement inconsistent bit block/dirty-bitmaps: disallow busy bitmaps as merge source block/dirty-bitmaps: prohibit removing readonly bitmaps block/dirty-bitmaps: prohibit readonly bitmaps for backups block/dirty-bitmaps: add block_dirty_bitmap_check function block/dirty-bitmap: add inconsistent status block/dirty-bitmaps: add inconsistent bit iotests: add busy/recording bit test to 124 blockdev: remove unused paio parameter documentation block/dirty-bitmaps: move comment block block/dirty-bitmaps: unify qmp_locked and user_locked calls block/dirty-bitmap: explicitly lock bitmaps with successors nbd: change error checking order for bitmaps block/dirty-bitmap: change semantics of enabled predicate block/dirty-bitmap: remove set/reset assertions against enabled bit ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # tests/qemu-iotests/group
2019-03-13Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches: - file-posix: Make auto-read-only dynamic - Add x-blockdev-reopen QMP command - Finalize block-latency-histogram QMP command - gluster: Build fixes for newer lib version # gpg: Signature made Tue 12 Mar 2019 19:30:31 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (28 commits) qemu-iotests: Test the x-blockdev-reopen QMP command block: Add an 'x-blockdev-reopen' QMP command block: Remove the AioContext parameter from bdrv_reopen_multiple() block: Add bdrv_reset_options_allowed() block: Add a 'mutable_opts' field to BlockDriver block: Allow changing the backing file on reopen block: Allow omitting the 'backing' option in certain cases block: Handle child references in bdrv_reopen_queue() block: Add 'keep_old_opts' parameter to bdrv_reopen_queue() block: Freeze the backing chain for the duration of the stream job block: Freeze the backing chain for the duration of the mirror job block: Freeze the backing chain for the duration of the commit job block: Allow freezing BdrvChild links nvme: fix write zeroes offset and count file-posix: Make auto-read-only dynamic file-posix: Prepare permission code for fd switching file-posix: Lock new fd in raw_reopen_prepare() file-posix: Store BDRVRawState.reopen_state during reopen file-posix: Factor out raw_reconfigure_getfd() file-posix: Fix bdrv_open_flags() for snapshot=on ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-13Merge remote-tracking branch 'remotes/rth/tags/pull-dt-20190312' into stagingPeter Maydell
Break out documentation to docs/devel/. Add support for pattern groups. Other misc cleanups for multiple decode functions. # gpg: Signature made Tue 12 Mar 2019 16:59:37 GMT # gpg: using RSA key 64DF38E8AF7E215F # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-dt-20190312: decodetree: Properly diagnose fields overflowing an insn decodetree: Prefix extract function names with decode_function decodetree: Allow +- to begin a number initializing a field decodetree: Produce clean output for an empty input file decodetree: Add --static-decode option test/decode: Add tests for PatternGroups decodetree: Allow grouping of overlapping patterns decodetree: Do not unconditionaly return from Pattern.output_code decodetree: Ensure build_tree does not include values outside insnmask decodetree: Document the usefulness of argument sets decodetree: Move documentation to docs/devel/decodetree.rst MAINTAINERS: Add scripts/decodetree.py to the TCG section Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-13Merge remote-tracking branch 'remotes/rth/tags/pull-hppa-20190312' into stagingPeter Maydell
Misc fixes affecting HP-UX 10.20. # gpg: Signature made Tue 12 Mar 2019 16:16:32 GMT # gpg: using RSA key 64DF38E8AF7E215F # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-hppa-20190312: target/hppa: exit TB if either Data or Instruction TLB changes target/hppa: add TLB protection id check target/hppa: allow multiple itlbp without itlba target/hppa: fix b,gate instruction target/hppa: ignore DIAG opcode target/hppa: remove PSW I/R/Q bit check target/hppa: add TLB trace events target/hppa: report ITLB_EXCP_MISS for ITLB misses target/hppa: fix TLB handling for page 0 target/hppa: fix overwriting source reg in addb target/hppa: Check for page crossings in use_goto_tb Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-12i386, acpi: check acpi_memory_hotplug capacity in pre_plugWei Yang
Currently we do device realization like below: hotplug_handler_pre_plug() dc->realize() hotplug_handler_plug() Before we do device realization and plug, we should allocate necessary resources and check if memory-hotplug-support property is enabled. At the piix4 and ich9, the memory-hotplug-support property is checked at plug stage. This means that device has been realized and mapped into guest address space 'pc_dimm_plug()' by the time acpi plug handler is called, where it might fail and crash QEMU due to reaching g_assert_not_reached() (piix4) or error_abort (ich9). Fix it by checking if memory hotplug is enabled at pre_plug stage where we can gracefully abort hotplug request. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> CC: Igor Mammedov <imammedo@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190301033548.6691-1-richardw.yang@linux.intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12gen_pcie_root_port: Add ACS (Access Control Services) capabilityKnut Omang
Claim ACS support in the generic PCIe root port to allow passthrough of individual functions of a device to different guests (in a nested virt.setting) with VFIO. Without this patch, all functions of a device, such as all VFs of an SR/IOV device, will end up in the same IOMMU group. A similar situation occurs on Windows with Hyper-V. In the single function device case, it also has a small cosmetic benefit in that the root port itself is not grouped with the device. VFIO handles that situation in that binding rules only apply to endpoints, so it does not limit passthrough in those cases. Signed-off-by: Knut Omang <knut.omang@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Message-Id: <319460b483f566dd57487eb3dd340ed4c10aa53c.1550768238.git-series.knut.omang@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2019-03-12pcie: Add a simple PCIe ACS (Access Control Services) helper functionKnut Omang
Implementing an ACS capability on downstream ports and multifunction endpoints indicates isolation and IOMMU visibility to a finer granularity. This creates smaller IOMMU groups in the guest and thus more flexibility in assigning endpoints to guest userspace or an L2 guest. Signed-off-by: Knut Omang <knut.omang@oracle.com> Message-Id: <07489975121696f5573b0a92baaf3486ef51e35d.1550768238.git-series.knut.omang@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2019-03-12vhost-user-blk: Add support to get/set inflight bufferXie Yongji
This patch adds support for vhost-user-blk device to get/set inflight buffer from/to backend. Signed-off-by: Xie Yongji <xieyongji@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Message-Id: <20190228085355.9614-6-xieyongji@baidu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12libvhost-user: Support tracking inflight I/O in shared memoryXie Yongji
This patch adds support for VHOST_USER_GET_INFLIGHT_FD and VHOST_USER_SET_INFLIGHT_FD message to set/get shared buffer to/from qemu. Then backend can track inflight I/O in this buffer. Signed-off-by: Xie Yongji <xieyongji@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Message-Id: <20190228085355.9614-5-xieyongji@baidu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12libvhost-user: Introduce vu_queue_map_desc()Xie Yongji
Introduce vu_queue_map_desc() which should be independent with vu_queue_pop(); Signed-off-by: Xie Yongji <xieyongji@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20190228085355.9614-4-xieyongji@baidu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12libvhost-user: Remove unnecessary FD flag check for event file descriptorsXie Yongji
The vu_check_queue_msg_file() has checked the FD flag. So let's delete the redundant check after it. Signed-off-by: Xie Yongji <xieyongji@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Message-Id: <20190228085355.9614-3-xieyongji@baidu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12vhost-user: Support transferring inflight buffer between qemu and backendXie Yongji
This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared buffer between qemu and backend. Firstly, qemu uses VHOST_USER_GET_INFLIGHT_FD to get the shared buffer from backend. Then qemu should send it back through VHOST_USER_SET_INFLIGHT_FD each time we start vhost-user. This shared buffer is used to track inflight I/O by backend. Qemu should retrieve a new one when vm reset. Signed-off-by: Xie Yongji <xieyongji@baidu.com> Signed-off-by: Chai Wen <chaiwen@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Message-Id: <20190228085355.9614-2-xieyongji@baidu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12nvdimm: use NVDIMM_ACPI_IO_LEN for the proper IO sizeWei Yang
The IO range is defined to 4 bytes with NVDIMM_ACPI_IO_LEN, so it is more proper to use this macro instead of calculating it by sizeof. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190227075101.6263-4-richardw.yang@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2019-03-12nvdimm: use *function* directly instead of allocating it againWei Yang
At the beginning or nvdimm_build_common_dsm(), variable *function* is already allocated for Arg2. This patch reuse variable *function* instead of allocating it again. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190227075101.6263-3-richardw.yang@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2019-03-12nvdimm: fix typo in nvdimm_build_nvdimm_devices argumentWei Yang
>From dsm_dma_arrea to dsm_dma_area. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190227075101.6263-2-richardw.yang@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2019-03-12intel_iommu: add scalable-mode option to make scalable mode workYi Sun
This patch adds an option to provide flexibility for user to expose Scalable Mode to guest. User could expose Scalable Mode to guest by the config as below: "-device intel-iommu,caching-mode=on,scalable-mode=on" The Linux iommu driver has supported scalable mode. Please refer below patch set: https://www.spinics.net/lists/kernel/msg2985279.html Signed-off-by: Liu, Yi L <yi.l.liu@intel.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Message-Id: <1551753295-30167-4-git-send-email-yi.y.sun@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12intel_iommu: add 256 bits qi_desc supportLiu, Yi L
Per Intel(R) VT-d 3.0, the qi_desc is 256 bits in Scalable Mode. This patch adds emulation of 256bits qi_desc. Signed-off-by: Liu, Yi L <yi.l.liu@intel.com> [Yi Sun is co-developer to rebase and refine the patch.] Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <1551753295-30167-3-git-send-email-yi.y.sun@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12intel_iommu: scalable mode emulationLiu, Yi L
Intel(R) VT-d 3.0 spec introduces scalable mode address translation to replace extended context mode. This patch extends current emulator to support Scalable Mode which includes root table, context table and new pasid table format change. Now intel_iommu emulates both legacy mode and scalable mode (with legacy-equivalent capability set). The key points are below: 1. Extend root table operations to support both legacy mode and scalable mode. 2. Extend context table operations to support both legacy mode and scalable mode. 3. Add pasid tabled operations to support scalable mode. Signed-off-by: Liu, Yi L <yi.l.liu@intel.com> [Yi Sun is co-developer to contribute much to refine the whole commit.] Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Message-Id: <1551753295-30167-2-git-send-email-yi.y.sun@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
2019-03-12libvhost-user: add vu_queue_unpop()Marc-André Lureau
vhost-user-input will make use of this function to undo some queue pop in case the virtio queue does not have enough room. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20190308140454.32437-11-marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12libvhost-user-glib: export vug_source_new()Marc-André Lureau
Simplify the creation of FD sources for other users. This is just convenience to avoid duplicating similar code elsewhere. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20190308140454.32437-10-marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12vhost-user: split vhost_user_read()Marc-André Lureau
Split vhost_user_read(), so only header can be read with vhost_user_read_header(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20190308140454.32437-8-marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12vhost-user: wrap some read/write with retry handlingMarc-André Lureau
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20190308140454.32437-6-marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12libvhost-user: exit by default on VHOST_USER_NONEMarc-André Lureau
Since commit 2566378d6d13bf4d28c7770bdbda5f7682594bbe, libvhost-user no longer panics on disconnect (rc == 0), and instead silently ignores an invalid VHOST_USER_NONE message. Without extra work from the API user, this will simply busy-loop on HUP events. The obvious thing to do is to exit(0) instead, while additional or different work can be done by overriding iface->process_msg(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com> Message-Id: <20190308140454.32437-5-marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12vhost-user: simplify vhost_user_init/vhost_user_cleanupMarc-André Lureau
Take a VhostUserState* that can be pre-allocated, and initialize it with the associated chardev. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Tiwei Bie <tiwei.bie@intel.com> Message-Id: <20190308140454.32437-4-marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12vhost-user: define conventions for vhost-user backendsMarc-André Lureau
As discussed during "[PATCH v4 00/29] vhost-user for input & GPU" review, let's define a common set of backend conventions to help with management layer implementation, and interoperability. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20190308140454.32437-3-marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12libvhost-user: fix clang enum-conversion warningMarc-André Lureau
Now that the VhostUserMsg.request field is used for both master & slave requests, since commit d84599f56c820d8c1ac9928a76500dcdfbbf194d: contrib/libvhost-user/libvhost-user.c:953:20: error: implicit conversion from enumeration type 'enum VhostUserSlaveRequest' to different enumeration type 'VhostUserRequest' (aka 'enum VhostUserRequest') [-Werror,-Wenum-conversion] .request = VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20190308140454.32437-2-marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12virtio-balloon: Restore MADV_WILLNEED hint on balloon deflateDavid Gibson
Prior to f6deb6d9 "virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate", the balloon device issued an madvise() MADV_WILLNEED on pages removed from the balloon. That would hint to the host kernel that the pages were likely to be needed by the guest in the near future. It's unclear if this is actually valuable or not, and so f6deb6d9 removed this, essentially ignoring balloon deflate requests. However, concerns have been raised that this might cause a performance regression by causing extra latency for the guest in certain configurations. So, until we can get actual benchmark data to see if that's the case, this restores the old behaviour, issuing a MADV_WILLNEED when a page is removed from the balloon. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190306030601.21986-4-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12virtio-balloon: Fix possible guest memory corruption with inflates & deflatesDavid Gibson
This fixes a balloon bug with a nasty consequence - potentially corrupting guest memory - but which is extremely unlikely to be triggered in practice. The balloon always works in 4kiB units, but the host could have a larger page size on certain platforms. Since ed48c59 "virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size" we've handled this by accumulating requests to balloon 4kiB subpages until they formed a full host page. Since f6deb6d "virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate" we essentially ignore deflate requests. Suppose we have a host with 8kiB pages, and one host page has subpages A & B. If we get this sequence of events - inflate A deflate A inflate B - the current logic will discard the whole host page. That's incorrect because the guest has deflated subpage A, and could have written important data to it. This patch fixes the problem by adjusting our state information about partially ballooned host pages when deflate requests are received. Fixes: ed48c59 "virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size" Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190306030601.21986-3-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Hildenbrand <david@redhat.com>
2019-03-12virtio-balloon: Don't mismatch g_malloc()/free (CID 1399146)David Gibson
ed48c59875b6 "virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size" introduced a new temporary data structure which tracks 4kiB chunks which have been inserted into the balloon by the guest but don't yet form a full host page which we can discard. Unfortunately, I had a thinko and allocated that structure with g_malloc0() but freed it with a plain free() rather than g_free(). This corrects the problem. Fixes: ed48c59875b6 Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190306030601.21986-2-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2019-03-12virtio-balloon: fix a use-after-free caseWei Wang
The elem could theorically contain both outbuf and inbufs. We move the free operation to the end of this function to avoid using elem->in_sg while elem has been freed. Fixes: c13c4153f7 ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Wei Wang <wei.w.wang@intel.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> CC: Juan Quintela <quintela@redhat.com> CC: Peter Xu <peterx@redhat.com> Message-Id: <1552383280-4122-1-git-send-email-wei.w.wang@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-03-12Merge remote-tracking branch ↵Peter Maydell
'remotes/huth-gitlab/tags/pull-request-2019-03-12' into staging - qtest patches - One SD patch (with Reviewed-by from the maintainer) - One license fix patch # gpg: Signature made Tue 12 Mar 2019 09:03:58 GMT # gpg: using RSA key 2ED9D774FE702DB5 # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full] # gpg: aka "Thomas Huth <thuth@redhat.com>" [full] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * remotes/huth-gitlab/tags/pull-request-2019-03-12: scripts/qemugdb: re-license timers.py to GPLv2 or later hw/sd/sdhci: Move PCI-related code into a separate file ahci-test: Drop dependence on global_qtest tests: test-announce-self: fix memory leak Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-12qemu-iotests: Test the x-blockdev-reopen QMP commandAlberto Garcia
This patch adds several tests for the x-blockdev-reopen QMP command. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12block: Add an 'x-blockdev-reopen' QMP commandAlberto Garcia
This command allows reopening an arbitrary BlockDriverState with a new set of options. Some options (e.g node-name) cannot be changed and some block drivers don't allow reopening, but otherwise this command is modelled after 'blockdev-add' and the state of the reopened BlockDriverState should generally be the same as if it had just been added by 'blockdev-add' with the same set of options. One notable exception is the 'backing' option: 'x-blockdev-reopen' requires that it is always present unless the BlockDriverState in question doesn't have a current or default backing file. This command allows reconfiguring the graph by using the appropriate options to change the children of a node. At the moment it's possible to change a backing file by setting the 'backing' option to the name of the new node, but it should also be possible to add a similar functionality to other block drivers (e.g. Quorum, blkverify). Although the API is unlikely to change, this command is marked experimental for the time being so there's room to see if the semantics need changes. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12block: Remove the AioContext parameter from bdrv_reopen_multiple()Alberto Garcia
This parameter has been unused since 1a63a907507fbbcfaee3f622907ec244b Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12block: Add bdrv_reset_options_allowed()Alberto Garcia
bdrv_reopen_prepare() receives a BDRVReopenState with (among other things) a new set of options to be applied to that BlockDriverState. If an option is missing then it means that we want to reset it to its default value rather than keep the previous one. This way the state of the block device after being reopened is comparable to that of a device added with "blockdev-add" using the same set of options. Not all options from all drivers can be changed this way, however. If the user attempts to reset an immutable option to its default value using this method then we must forbid it. This new function takes a BlockDriverState and a new set of options and checks if there's any option that was previously set but is missing from the new set of options. If the option is present in both sets we don't need to check that they have the same value. The loop at the end of bdrv_reopen_prepare() already takes care of that. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12block: Add a 'mutable_opts' field to BlockDriverAlberto Garcia
If we reopen a BlockDriverState and there is an option that is present in bs->options but missing from the new set of options then we have to return an error unless the driver is able to reset it to its default value. This patch adds a new 'mutable_opts' field to BlockDriver. This is a list of runtime options that can be modified during reopen. If an option in this list is unspecified on reopen then it must be reset (or return an error). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12block: Allow changing the backing file on reopenAlberto Garcia
This patch allows the user to change the backing file of an image that is being reopened. Here's what it does: - In bdrv_reopen_prepare(): check that the value of 'backing' points to an existing node or is null. If it points to an existing node it also needs to make sure that replacing the backing file will not create a cycle in the node graph (i.e. you cannot reach the parent from the new backing file). - In bdrv_reopen_commit(): perform the actual node replacement by calling bdrv_set_backing_hd(). There may be temporary implicit nodes between a BDS and its backing file (e.g. a commit filter node). In these cases bdrv_reopen_prepare() looks for the real (non-implicit) backing file and requires that the 'backing' option points to it. Replacing or detaching a backing file is forbidden if there are implicit nodes in the middle. Although x-blockdev-reopen is meant to be used like blockdev-add, there's an important thing that must be taken into account: the only way to set a new backing file is by using a reference to an existing node (previously added with e.g. blockdev-add). If 'backing' contains a dictionary with a new set of options ({"driver": "qcow2", "file": { ... }}) then it is interpreted that the _existing_ backing file must be reopened with those options. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12block: Allow omitting the 'backing' option in certain casesAlberto Garcia
Of all options of type BlockdevRef used to specify children in BlockdevOptions, 'backing' is the only one that is optional. For "x-blockdev-reopen" we want that if an option is omitted then it must be reset to its default value. The default value of 'backing' means that QEMU opens the backing file specified in the image metadata, but this is not something that we want to support for the reopen operation. Because of this the 'backing' option has to be specified during reopen, pointing to the existing backing file if we want to keep it, or pointing to a different one (or NULL) if we want to replace it (to be implemented in a subsequent patch). In order to simplify things a bit and not to require that the user passes the 'backing' option to every single block device even when it's clearly not necessary, this patch allows omitting this option if the block device being reopened doesn't have a backing file attached _and_ no default backing file is specified in the image metadata. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12block: Handle child references in bdrv_reopen_queue()Alberto Garcia
Children in QMP are specified with BlockdevRef / BlockdevRefOrNull, which can contain a set of child options, a child reference, or NULL. In optional attributes like "backing" it can also be missing. Only the first case (set of child options) is being handled properly by bdrv_reopen_queue(). This patch deals with all the others. Here's how these cases should be handled when bdrv_reopen_queue() is deciding what to do with each child of a BlockDriverState: 1) Set of child options: if the child was implicitly created (i.e inherits_from points to the parent) then the options are removed from the parent's options QDict and are passed to the child with a recursive bdrv_reopen_queue() call. This case was already working fine. 2) Child reference: there's two possibilites here. 2a) Reference to the current child: if the child was implicitly created then it is put in the reopen queue, keeping its current set of options (since this was a child reference there was no way to specify a different set of options). If the child is not implicit then it keeps its current set of options but it is not reopened (and therefore does not inherit any new option from the parent). 2b) Reference to a different BDS: the current child is not put in the reopen queue at all. Passing a reference to a different BDS can be used to replace a child, although at the moment no driver implements this, so it results in an error. In any case, the current child is not going to be reopened (and might in fact disappear if it's replaced) 3) NULL: This is similar to (2b). Although no driver allows this yet it can be used to detach the current child so it should not be put in the reopen queue. 4) Missing option: at the moment "backing" is the only case where this can happen. With "blockdev-add", leaving "backing" out means that the default backing file is opened. We don't want to open a new image during reopen, so we require that "backing" is always present. We'll relax this requirement a bit in the next patch. If keep_old_opts is true and "backing" is missing then this behaves like 2a (the current child is reopened). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12block: Add 'keep_old_opts' parameter to bdrv_reopen_queue()Alberto Garcia
The bdrv_reopen_queue() function is used to create a queue with the BDSs that are going to be reopened and their new options. Once the queue is ready bdrv_reopen_multiple() is called to perform the operation. The original options from each one of the BDSs are kept, with the new options passed to bdrv_reopen_queue() applied on top of them. For "x-blockdev-reopen" we want a function that behaves much like "blockdev-add". We want to ignore the previous set of options so that only the ones actually specified by the user are applied, with the rest having their default values. One of the things that we need is a way to tell bdrv_reopen_queue() whether we want to keep the old set of options or not, and that's what this patch does. All current callers are setting this new parameter to true and x-blockdev-reopen will set it to false. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12block: Freeze the backing chain for the duration of the stream jobAlberto Garcia
Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12block: Freeze the backing chain for the duration of the mirror jobAlberto Garcia
Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12block: Freeze the backing chain for the duration of the commit jobAlberto Garcia
Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12block: Allow freezing BdrvChild linksAlberto Garcia
Our permission system is useful to define what operations are allowed on a certain block node and includes things like BLK_PERM_WRITE or BLK_PERM_RESIZE among others. One of the permissions is BLK_PERM_GRAPH_MOD which allows "changing the node that this BdrvChild points to". The exact meaning of this has never been very clear, but it can be understood as "change any of the links connected to the node". This can be used to prevent changing a backing link, but it's too coarse. This patch adds a new 'frozen' attribute to BdrvChild, which forbids detaching the link from the node it points to, and new API to freeze and unfreeze a backing chain. After this change a few functions can fail, so they need additional checks. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12nvme: fix write zeroes offset and countKeith Busch
The implementation used blocks units rather than the expected bytes. Fixes: c03e7ef12a9 ("nvme: Implement Write Zeroes") Reported-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12file-posix: Make auto-read-only dynamicKevin Wolf
Until now, with auto-read-only=on we tried to open the file read-write first and if that failed, read-only was tried. This is actually not good enough for libvirt, which gives QEMU SELinux permissions for read-write only as soon as it actually intends to write to the image. So we need to be able to switch between read-only and read-write at runtime. This patch makes auto-read-only dynamic, i.e. the file is opened read-only as long as no user of the node has requested write permissions, but it is automatically reopened read-write as soon as the first writer is attached. Conversely, if the last writer goes away, the file is reopened read-only again. bs->read_only is no longer set for auto-read-only=on files even if the file descriptor is opened read-only because it will be transparently upgraded as soon as a writer is attached. This changes the output of qemu-iotests 232. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12file-posix: Prepare permission code for fd switchingKevin Wolf
In order to be able to dynamically reopen the file read-only or read-write, depending on the users that are attached, we need to be able to switch to a different file descriptor during the permission change. This interacts with reopen, which also creates a new file descriptor and performs permission changes internally. In this case, the permission change code must reuse the reopen file descriptor instead of creating a third one. In turn, reopen can drop its code to copy file locks to the new file descriptor because that is now done when applying the new permissions. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12file-posix: Lock new fd in raw_reopen_prepare()Kevin Wolf
There is no reason why we can take locks on the new file descriptor only in raw_reopen_commit() where error handling isn't possible any more. Instead, we can already do this in raw_reopen_prepare(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12file-posix: Store BDRVRawState.reopen_state during reopenKevin Wolf
We'll want to access the file descriptor in the reopen_state while processing permission changes in the context of the repoen. Signed-off-by: Kevin Wolf <kwolf@redhat.com>