aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-05-16curl: convert readv to coroutinesPaolo Bonzini
This is pretty simple. The bottom half goes away because, unlike bdrv_aio_readv, coroutine-based read can return immediately without yielding. However, for simplicity I kept the former bottom half handler in a separate function. Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-7-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16curl: convert CURLAIOCB to byte valuesPaolo Bonzini
This is in preparation for the conversion from bdrv_aio_readv to bdrv_co_preadv, and it also requires changing some of the size_t values to uint64_t. This was broken before for disks > 2TB, but now it would break at 4GB. Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-6-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16curl: split curl_find_state/curl_init_statePaolo Bonzini
If curl_easy_init fails, a CURLState is left with s->in_use = 1. Split curl_init_state in two, so that we can distinguish the two failures and call curl_clean_state if needed. While at it, simplify curl_find_state, removing a dummy loop. The aio_poll loop is moved to the sole caller that needs it. Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-5-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16curl: avoid recursive locking of BDRVCURLState mutexPaolo Bonzini
The curl driver has a ugly hack where, if it cannot find an empty CURLState, it just uses aio_poll to wait for one to be empty. This is probably buggy when used together with dataplane, and the simplest way to fix it is to use coroutines instead. A more immediate effect of the bug however is that it can cause a recursive call to curl_readv_bh_cb and recursively taking the BDRVCURLState mutex. This causes a deadlock. The fix is to unlock the mutex around aio_poll, but for cleanliness we should also take the mutex around all calls to curl_init_state, even if reaching the unlock/lock pair is impossible. The same is true for curl_clean_state. Reported-by: Kun Wei <kuwei@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170515100059.15795-4-pbonzini@redhat.com Cc: qemu-stable@nongnu.org Cc: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16curl: never invoke callbacks with s->mutex heldPaolo Bonzini
All curl callbacks go through curl_multi_do, and hence are called with s->mutex held. Note that with comments, and make curl_read_cb drop the lock before invoking the callback. Likewise for curl_find_buf, where the callback can be invoked by the caller. Cc: qemu-stable@nongnu.org Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-3-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16curl: strengthen assertion in curl_clean_statePaolo Bonzini
curl_clean_state should only be called after all AIOCBs have been completed. This is not so obvious for the call from curl_detach_aio_context, so assert that. Cc: qemu-stable@nongnu.org Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170515100059.15795-2-pbonzini@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-16block: curl: Allow passing cookies via QCryptoSecretPeter Krempa
Since cookies can contain sensitive data (session ID, etc ...) it is desired to hide them from the prying eyes of users. Add a possibility to pass them via the secret infrastructure. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1447413 Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: f4a22cdebdd0bca6a13a43a2a6deead7f2ec4bb3.1493906281.git.pkrempa@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-05-12Merge tag 'tracing-pull-request' into stagingStefan Hajnoczi
# gpg: Signature made Fri 12 May 2017 10:38:07 AM EDT # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * tag 'tracing-pull-request': trace: add sanity check Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-12Merge tag 'block-pull-request' into stagingStefan Hajnoczi
# gpg: Signature made Fri 12 May 2017 10:37:12 AM EDT # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * tag 'block-pull-request': aio: add missing aio_notify() to aio_enable_external() block: Simplify BDRV_BLOCK_RAW recursion coroutine: remove GThread implementation Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-12Merge remote-tracking branch 'kwolf/tags/for-upstream' into stagingStefan Hajnoczi
Block layer patches # gpg: Signature made Thu 11 May 2017 10:31:37 AM EDT # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * kwolf/tags/for-upstream: (58 commits) MAINTAINERS: Add qemu-progress to the block layer qcow2: Discard/zero clusters by byte count qcow2: Assert that cluster operations are aligned qcow2: Optimize write zero of unaligned tail cluster iotests: Add test 179 to cover write zeroes with unmap iotests: Improve _filter_qemu_img_map qcow2: Optimize zero_single_l2() to minimize L2 churn qcow2: Make distinction between zero cluster types obvious qcow2: Name typedef for cluster type qcow2: Correctly report status of preallocated zero clusters block: Update comments on BDRV_BLOCK_* meanings qcow2: Use consistent switch indentation qcow2: Nicer variable names in qcow2_update_snapshot_refcount() tests: Add coverage for recent block geometry fixes blkdebug: Add ability to override unmap geometries blkdebug: Simplify override logic blkdebug: Add pass-through write_zero and discard support blkdebug: Refactor error injection blkdebug: Sanity check block layer guarantees qemu-io: Switch 'map' output to byte-based reporting ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-12trace: add sanity checkAnthony Xu
If trace backend is set to TRACE_NOP, trace_get_vcpu_event_count returns 0, cause bitmap_new call abort. The abort can be triggered as follows: $ ./configure --enable-trace-backend=nop --target-list=x86_64-softmmu $ gdb ./x86_64-softmmu/qemu-system-x86_64 -M q35,accel=kvm -m 1G (gdb) bt #0 0x00007ffff04e25f7 in raise () from /lib64/libc.so.6 #1 0x00007ffff04e3ce8 in abort () from /lib64/libc.so.6 #2 0x00005555559de905 in bitmap_new (nbits=<optimized out>) at /home/root/git/qemu2.git/include/qemu/bitmap.h:96 #3 cpu_common_initfn (obj=0x555556621d30) at qom/cpu.c:399 #4 0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bbb0) at qom/object.c:341 #5 0x0000555555a11869 in object_init_with_type (obj=0x555556621d30, ti=0x55555656bd30) at qom/object.c:341 #6 0x0000555555a11efc in object_initialize_with_type (data=data@entry=0x555556621d30, size=76560, type=type@entry=0x55555656bd30) at qom/object.c:376 #7 0x0000555555a12061 in object_new_with_type (type=0x55555656bd30) at qom/object.c:484 #8 0x0000555555a121c5 in object_new (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu") at qom/object.c:494 #9 0x00005555557f6e3d in pc_new_cpu (typename=typename@entry=0x555556550340 "qemu64-x86_64-cpu", apic_id=0, errp=errp@entry=0x5555565391b0 <error_fatal>) at /home/root/git/qemu2.git/hw/i386/pc.c:1101 #10 0x00005555557fa33e in pc_cpus_init (pcms=pcms@entry=0x5555565f9690) at /home/root/git/qemu2.git/hw/i386/pc.c:1184 #11 0x00005555557fe0f6 in pc_q35_init (machine=0x5555565f9690) at /home/root/git/qemu2.git/hw/i386/pc_q35.c:121 #12 0x000055555574fbad in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4562 Signed-off-by: Anthony Xu <anthony.xu@intel.com> Message-id: 1494369432-15418-1-git-send-email-anthony.xu@intel.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-12aio: add missing aio_notify() to aio_enable_external()Stefan Hajnoczi
The main loop uses aio_disable_external()/aio_enable_external() to temporarily disable processing of external AioContext clients like device emulation. This allows monitor commands to quiesce I/O and prevent the guest from submitting new requests while a monitor command is in progress. The aio_enable_external() API is currently broken when an IOThread is in aio_poll() waiting for fd activity when the main loop re-enables external clients. Incrementing ctx->external_disable_cnt does not wake the IOThread from ppoll(2) so fd processing remains suspended and leads to unresponsive emulated devices. This patch adds an aio_notify() call to aio_enable_external() so the IOThread is kicked out of ppoll(2) and will re-arm the file descriptors. The bug can be reproduced as follows: $ qemu -M accel=kvm -m 1024 \ -object iothread,id=iothread0 \ -device virtio-scsi-pci,iothread=iothread0,id=virtio-scsi-pci0 \ -drive if=none,id=drive0,aio=native,cache=none,format=raw,file=test.img \ -device scsi-hd,id=scsi-hd0,drive=drive0 \ -qmp tcp::5555,server,nowait $ scripts/qmp/qmp-shell localhost:5555 (qemu) blockdev-snapshot-sync device=drive0 snapshot-file=sn1.qcow2 mode=absolute-paths format=qcow2 After blockdev-snapshot-sync completes the SCSI disk will be unresponsive. This leads to request timeouts inside the guest. Reported-by: Qianqian Zhu <qizhu@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170508180705.20609-1-stefanha@redhat.com Suggested-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-12block: Simplify BDRV_BLOCK_RAW recursionEric Blake
Since we are already in coroutine context during the body of bdrv_co_get_block_status(), we can shave off a few layers of wrappers when recursing to query the protocol when a format driver returned BDRV_BLOCK_RAW. Note that we are already using the correct recursion later on in the same function, when probing whether the protocol layer is sparse in order to find out if we can add BDRV_BLOCK_ZERO to an existing BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170504173745.27414-1-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-12coroutine: remove GThread implementationDaniel P. Berrange
The GThread implementation is not functional enough to actually run QEMU reliably. While it was potentially useful for debugging, we have a scripts/qemugdb/coroutine.py to enable tracing of ucontext coroutines in GDB, so that removes the only reason for GThread to exist. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Acked-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-11maintainers: Add myself as linux-user reviewerLaurent Vivier
I volunteer to review linux-user patches. Adding myself will help to not miss some of them. Signed-off-by: Laurent Vivier <laurent@vivier.eu> Acked-by: Riku Voipio <riku.voipio@linaro.org> Message-id: 20170510153950.29343-1-laurent@vivier.eu Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-11Merge remote-tracking branch 'mreitz/tags/pull-block-2017-05-11' into ↵Kevin Wolf
queue-block Block patches for the block queue. # gpg: Signature made Thu May 11 14:28:41 2017 CEST # gpg: using RSA key 0xF407DB0061D5CF40 # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * mreitz/tags/pull-block-2017-05-11: (22 commits) MAINTAINERS: Add qemu-progress to the block layer qcow2: Discard/zero clusters by byte count qcow2: Assert that cluster operations are aligned qcow2: Optimize write zero of unaligned tail cluster iotests: Add test 179 to cover write zeroes with unmap iotests: Improve _filter_qemu_img_map qcow2: Optimize zero_single_l2() to minimize L2 churn qcow2: Make distinction between zero cluster types obvious qcow2: Name typedef for cluster type qcow2: Correctly report status of preallocated zero clusters block: Update comments on BDRV_BLOCK_* meanings qcow2: Use consistent switch indentation qcow2: Nicer variable names in qcow2_update_snapshot_refcount() tests: Add coverage for recent block geometry fixes blkdebug: Add ability to override unmap geometries blkdebug: Simplify override logic blkdebug: Add pass-through write_zero and discard support blkdebug: Refactor error injection blkdebug: Sanity check block layer guarantees qemu-io: Switch 'map' output to byte-based reporting ... Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11MAINTAINERS: Add qemu-progress to the block layerMax Reitz
util/qemu-progress.c is currently unmaintained. The only user of its functionality is qemu-img, so it effectively is part of the block layer. Suggested-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170428165517.30341-1-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11qcow2: Discard/zero clusters by byte countEric Blake
Passing a byte offset, but sector count, when we ultimately want to operate on cluster granularity, is madness. Clean up the external interfaces to take both offset and count as bytes, while still keeping the assertion added previously that the caller must align the values to a cluster. Then rename things to make sure backports don't get confused by changed units: instead of qcow2_discard_clusters() and qcow2_zero_clusters(), we now have qcow2_cluster_discard() and qcow2_cluster_zeroize(). The internal functions still operate on clusters at a time, and return an int for number of cleared clusters; but on an image with 2M clusters, a single L2 table holds 256k entries that each represent a 2M cluster, totalling well over INT_MAX bytes if we ever had a request for that many bytes at once. All our callers currently limit themselves to 32-bit bytes (and therefore fewer clusters), but by making this function 64-bit clean, we have one less place to clean up if we later improve the block layer to support 64-bit bytes through all operations (with the block layer auto-fragmenting on behalf of more-limited drivers), rather than the current state where some interfaces are artificially limited to INT_MAX at a time. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-13-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11qcow2: Assert that cluster operations are alignedEric Blake
We already audited (in commit 0c1bd469) that qcow2_discard_clusters() is only passed cluster-aligned start values; but we can further tighten the assertion that the only unaligned end value is at EOF. Recent commits have taken advantage of an unaligned tail cluster, for both discard and write zeroes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-12-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11qcow2: Optimize write zero of unaligned tail clusterEric Blake
We've already improved discards to operate efficiently on the tail of an unaligned qcow2 image; it's time to make a similar improvement to write zeroes. The special case is only valid at the tail cluster of a file, where we must recognize that any sectors beyond the image end would implicitly read as zero, and therefore should not penalize our logic for widening a partial cluster into writing the whole cluster as zero. However, note that for now, the special case of end-of-file is only recognized if there is no backing file, or if the backing file has the same length; that's because when the backing file is shorter than the active layer, we don't have code in place to recognize that reads of a sector unallocated at the top and beyond the backing end-of-file are implicitly zero. It's not much of a real loss, because most people don't use images that aren't cluster-aligned, or where the active layer is a different size than the backing layer (especially where the difference falls within a single cluster). Update test 154 to cover the new scenarios, using two images of intentionally differing length. While at it, fix the test to gracefully skip when run as ./check -qcow2 -o compat=0.10 154 since the older format lacks zero clusters already required earlier in the test. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-11-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11iotests: Add test 179 to cover write zeroes with unmapEric Blake
No tests were covering write zeroes with unmap. Additionally, I needed to prove that my previous patches for correct status reporting and write zeroes optimizations actually had an impact. The test works for cluster_size between 8k and 2M (for smaller sizes, it fails because our allocation patterns are not contiguous with small clusters - in part, the largest consecutive allocation we tend to get is often bounded by the size covered by one L2 table). Note that testing for zero clusters is tricky: 'qemu-io map' reports whether data comes from the current layer of the image (useful for sniffing out which regions of the file have QCOW_OFLAG_ZERO) - but doesn't show which clusters have mappings; while 'qemu-img map' sees "zero":true for both unallocated and zero clusters for any qcow2 with no backing layer (so less useful at detecting true zero clusters), but reliably shows mappings. So we have to rely on both queries side-by-side at each point of the test. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-10-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11iotests: Improve _filter_qemu_img_mapEric Blake
Although _filter_qemu_img_map documents that it scrubs offsets, it was only doing so for human mode. Of the existing tests using the filter (97, 122, 150, 154, 176), two of them are affected, but it does not hurt the validity of the tests to not require particular mappings (another test, 66, uses offsets but intentionally does not pass through _filter_qemu_img_map, because it checks that offsets are unchanged before and after an operation). Another justification for this patch is that it will allow a future patch to utilize 'qemu-img map --output=json' to check the status of preallocated zero clusters without regards to the mapping (since the qcow2 mapping can be very sensitive to the chosen cluster size, when preallocation is not in use). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-9-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11qcow2: Optimize zero_single_l2() to minimize L2 churnEric Blake
Similar to discard_single_l2(), we should try to avoid dirtying the L2 cache when the cluster we are changing already has the right characteristics. Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP is a requirement to unallocate a cluster (this is because the block layer clears that flag if discard.* flags during open requested that we never punch holes - see the conversation around commit 170f4b2e, https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html). Therefore, this patch can only reuse a zero cluster as-is if either unmapping is not requested, or if the zero cluster was not associated with an allocation. Technically, there are some cases where an unallocated cluster already reads as all zeroes (namely, when there is no backing file [easy: check bs->backing], or when the backing file also reads as zeroes [harder: we can't check bdrv_get_block_status since we are already holding the lock]), where the guest would not immediately see a difference if we left that cluster unallocated. But if the user did not request unmapping, leaving an unallocated cluster is wrong; and even if the user DID request unmapping, keeping a cluster unallocated risks a subtle semantic change of guest-visible contents if a backing file is later added, and it is not worth auditing whether all internal uses such as mirror properly avoid an unmap request. Thus, this patch is intentionally limited to just clusters that are already marked as zero. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-8-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11qcow2: Make distinction between zero cluster types obviousEric Blake
Treat plain zero clusters differently from allocated ones, so that we can simplify the logic of checking whether an offset is present. Do this by splitting QCOW2_CLUSTER_ZERO into two new enums, QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC. I tried to arrange the enum so that we could use 'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and 'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although I didn't actually end up taking advantage of the layout. In many cases, this leads to simpler code, by properly combining cases (sometimes, both zero types pair together, other times, plain zero is more like unallocated while allocated zero is more like normal). Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-7-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11qcow2: Name typedef for cluster typeEric Blake
Although it doesn't add all that much type safety (this is C, after all), it does add a bit of legibility to use the name QCow2ClusterType instead of a plain int. In particular, qcow2_get_cluster_offset() has an overloaded return type; a QCow2ClusterType on success, and -errno on failure; keeping the cluster type in a separate variable makes it slightly easier for the next patch to make further computations based on the type. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-6-eblake@redhat.com [mreitz: Use the new type in two more places (one of them pulled from the next patch)] Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11qcow2: Correctly report status of preallocated zero clustersEric Blake
We were throwing away the preallocation information associated with zero clusters. But we should be matching the well-defined semantics in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved, while still reminding the user that reading from that offset is likely to read garbage. count_contiguous_clusters_by_type() is now used only for unallocated cluster runs, hence it gets renamed and tightened. Making this change lets us see which portions of an image are zero but preallocated, when using qemu-img map --output=json. The --output=human side intentionally ignores all zero clusters, whether or not they are preallocated. The fact that there is no change to qemu-iotests './check -qcow2' merely means that we aren't yet testing this aspect of qemu-img; a later patch will add a test. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-5-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11block: Update comments on BDRV_BLOCK_* meaningsEric Blake
We had some conflicting documentation: a nice 8-way table that described all possible combinations of DATA, ZERO, and OFFSET_VALID, contrasted with text that implied that OFFSET_VALID always meant raw data could be read directly. Furthermore, the text refers a lot to bs->file, even though the interface was updated back in 67a0fd2a to let the driver pass back a specific BDS (not necessarily bs->file). As the 8-way table is the intended semantics, simplify the rest of the text to get rid of the confusion. ALLOCATED is always set by the block layer for convenience (drivers do not have to worry about it). RAW is used only internally, but by more than the raw driver. Document these additional items on the driver callback. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-4-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11qcow2: Use consistent switch indentationEric Blake
Fix a couple of inconsistent indentations, before an upcoming patch further tweaks the switch statements. (best viewed with 'git diff -b'). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170507000552.20847-3-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11qcow2: Nicer variable names in qcow2_update_snapshot_refcount()Eric Blake
In order to keep checkpatch happy when the next patch changes indentation, we first have to shorten some long lines. The easiest approach is to use a new variable in place of 'offset & L2E_OFFSET_MASK', except that 'offset' is the best name for that variable. Change '[old_]offset' to '[old_]entry' to make room. While touching things, also fix checkpatch warnings about unusual 'for' statements. Suggested by Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170507000552.20847-2-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11tests: Add coverage for recent block geometry fixesEric Blake
Use blkdebug's new geometry constraints to emulate setups that have needed past regression fixes: write zeroes asserting when running through a loopback block device with max-transfer smaller than cluster size, and discard rounding away portions of requests not aligned to preferred boundaries. Also, add coverage that the block layer is honoring max transfer limits. For now, a single iotest performs all actions, with the idea that we can add future blkdebug constraint test cases in the same file; but it can be split into multiple iotests if we find reason to run one portion of the test in more setups than what are possible in the other. For reference, the final portion of the test (checking whether discard passes as much as possible to the lowest layers of the stack) works as follows: qemu-io: discard 30M at 80000001, passed to blkdebug blkdebug: discard 511 bytes at 80000001, -ENOTSUP (smaller than blkdebug's 512 align) blkdebug: discard 14371328 bytes at 80000512, passed to qcow2 qcow2: discard 739840 bytes at 80000512, -ENOTSUP (smaller than qcow2's 1M align) qcow2: discard 13M bytes at 77M, succeeds blkdebug: discard 15M bytes at 90M, passed to qcow2 qcow2: discard 15M bytes at 90M, succeeds blkdebug: discard 1356800 bytes at 105M, passed to qcow2 qcow2: discard 1M at 105M, succeeds qcow2: discard 308224 bytes at 106M, -ENOTSUP (smaller than qcow2's 1M align) blkdebug: discard 1 byte at 111457280, -ENOTSUP (smaller than blkdebug's 512 align) Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-10-eblake@redhat.com [mreitz: For cooperation with image locking, add -r to the qemu-io invocation which verifies the image content] Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11blkdebug: Add ability to override unmap geometriesEric Blake
Make it easier to simulate various unusual hardware setups (for example, recent commits 3482b9b and b8d0a98 affect the Dell Equallogic iSCSI with its 15M preferred and maximum unmap and write zero sizing, or b2f95fe deals with the Linux loopback block device having a max_transfer of 64k), by allowing blkdebug to wrap any other device with further restrictions on various alignments. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-9-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11blkdebug: Simplify override logicEric Blake
Rather than store into a local variable, then copy to the struct if the value is valid, then reporting errors otherwise, it is simpler to just store into the struct and report errors if the value is invalid. This however requires that the struct store a 64-bit number, rather than a narrower type. Likewise, setting a sane errno value in ret prior to the sequence of parsing and jumping to out: on error makes it easier for the next patch to add a chain of similar checks. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170429191419.30051-8-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11blkdebug: Add pass-through write_zero and discard supportEric Blake
In order to test the effects of artificial geometry constraints on operations like write zero or discard, we first need blkdebug to manage these actions. It also allows us to inject errors on those operations, just like we can for read/write/flush. We can also test the contract promised by the block layer; namely, if a device has specified limits on alignment or maximum size, then those limits must be obeyed (for now, the blkdebug driver merely inherits limits from whatever it is wrapping, but the next patch will further enhance it to allow specific limit overrides). This patch intentionally refuses to service requests smaller than the requested alignments; this is because an upcoming patch adds a qemu-iotest to prove that the block layer is correctly handling fragmentation, but the test only works if there is a way to tell the difference at artificial alignment boundaries when blkdebug is using a larger-than-default alignment. If we let the blkdebug layer always defer to the underlying layer, which potentially has a smaller granularity, the iotest will be thwarted. Tested by setting up an NBD server with export 'foo', then invoking: $ ./qemu-io qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo qemu-io> d 0 15M qemu-io> w -z 0 15M Pre-patch, the server never sees the discard (it was silently eaten by the block layer); post-patch it is passed across the wire. Likewise, pre-patch the write is always passed with NBD_WRITE (with 15M of zeroes on the wire), while post-patch it can utilize NBD_WRITE_ZEROES (for less traffic). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-7-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11blkdebug: Refactor error injectionEric Blake
Rather than repeat the logic at each caller of checking if a Rule exists that warrants an error injection, fold that logic into inject_error(); and rename it to rule_check() for legibility. This will help the next patch, which adds two more callers that need to check rules for the potential of injecting errors. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-6-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11blkdebug: Sanity check block layer guaranteesEric Blake
Commits 04ed95f4 and 1a62d0ac updated the block layer to auto-fragment any I/O to fit within device boundaries. Additionally, when using a minimum alignment of 4k, we want to ensure the block layer does proper read-modify-write rather than requesting I/O on a slice of a sector. Let's enforce that the contract is obeyed when using blkdebug. For now, blkdebug only allows alignment overrides, and just inherits other limits from whatever device it is wrapping, but a future patch will further enhance things. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170429191419.30051-5-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11qemu-io: Switch 'map' output to byte-based reportingEric Blake
Mixing byte offset and sector allocation counts is a bit confusing. Also, reporting n/m sectors, where m decreases according to the remaining size of the file, isn't really adding any useful information; and reporting an offset at both the front and end of the line, with large amounts of whitespace, is pointless. Update the output to use byte counts and shorter lines, then adjust the affected tests (./check -qcow2 102, ./check -vpc 146). Note that 'qemu-io map' is MUCH weaker than 'qemu-img map'; the former only shows which regions of the active layer are allocated, without regards to where the allocation comes from or whether the allocated portion is known to read as zero (because it is using the weaker bdrv_is_allocated()); while the latter (especially in --output=json mode) reports more details from bdrv_get_block_status(). Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170429191419.30051-4-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11qemu-io: Switch 'alloc' command to byte-based lengthEric Blake
For the 'alloc' command, accepting an offset in bytes but a length in sectors, and reporting output in sectors, is confusing. Do everything in bytes, and adjust the expected output accordingly. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170429191419.30051-3-eblake@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11qemu-io: Improve alignment checksEric Blake
Several copy-and-pasted alignment checks exist in qemu-io, which could use some minor improvements: - Manual comparison against 0x1ff is not as clean as using our alignment macros (QEMU_IS_ALIGNED) from osdep.h. - The error messages aren't quite grammatically correct. Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170429191419.30051-2-eblake@redhat.com Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11blockdev: use drained_begin/end for qmp_block_resizeJohn Snow
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1447551 If one tries to issue a block_resize while a guest is busy accessing the disk, it is possible that qemu may deadlock when invoking aio_poll from both the main loop and the iothread. Replace another instance of bdrv_drain_all that doesn't quite belong. Cc: qemu-stable@nongnu.org Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11nvme: Implement Write ZeroesChristoph Hellwig
Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: ported over from qemu-nvme.git to mainline] Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11qemu-img: wait for convert coroutines to completeAnton Nefedov
On error path (like i/o error in one of the coroutines), it's required to - wait for coroutines completion before cleaning the common structures - reenter dependent coroutines so they ever finish Introduced in 2d9187bc65. Cc: qemu-stable@nongnu.org Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11file-posix: Remove .bdrv_inactivate/invalidate_cacheKevin Wolf
Now that the block layer takes care to request a lot less permissions for inactive nodes, the special-casing in file-posix isn't necessary any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11block: Fix write/resize permissions for inactive imagesKevin Wolf
Format drivers for inactive nodes don't need write/resize permissions on their bs->file and can share write/resize with another VM (in fact, this is the whole point of keeping images inactive). Represent this fact in the op blocker system, so that image locking does the right thing without special-casing inactive images. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11block: Inactivate parents before childrenKevin Wolf
The proper order for inactivating block nodes is that first the parents get inactivated and then the children. If we do things in this order, we can assert that we didn't accidentally leave a parent activated when one of its child nodes is inactive. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11block: Drop permissions when migration completesKevin Wolf
With image locking, permissions affect other qemu processes as well. We want to be sure that the destination can run, so let's drop permissions on the source when migration completes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11block: New BdrvChildRole.activate() for blk_resume_after_migration()Kevin Wolf
Instead of manually calling blk_resume_after_migration() in migration code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend activation with cache invalidation into a single function. This is achieved with a new callback in BdrvChildRole that is called by bdrv_invalidate_cache_all(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11migration: Unify block node activation error handlingKevin Wolf
Migration code activates all block driver nodes on the destination when the migration completes. It does so by calling bdrv_invalidate_cache_all() and blk_resume_after_migration(). There is one code path for precopy and one for postcopy migration, resulting in four function calls, which used to have three different failure modes. This patch unifies the behaviour so that failure to activate all block nodes is non-fatal, but the error message is logged and the VM isn't automatically started. 'cont' will retry activating the block nodes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11iotests: Extend test 066Max Reitz
066 was supposed to be a test "for discarding preallocated zero clusters", but it did so incompletely: While it did check the image file's integrity after the operation, it did not confirm that the clusters are indeed freed. This patch adds this test. In addition, new cases for writing to preallocated zero clusters are added. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11qcow2: Discard preallocated zero clustersMax Reitz
In discard_single_l2(), we completely discard normal clusters instead of simply turning them into preallocated zero clusters. That means we should probably do the same with such preallocated zero clusters: Discard them instead of keeping them allocated. Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-11qcow2: Reuse preallocated zero clustersMax Reitz
Instead of just freeing preallocated zero clusters and completely allocating them from scratch, reuse them. We cannot do this in handle_copied(), however, since this is a COW operation. Therefore, we have to add the new logic to handle_alloc() and simply return the existing offset if it exists. The only catch is that we have to convince qcow2_alloc_cluster_link_l2() not to free the old clusters (because we have reused them). Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>