aboutsummaryrefslogtreecommitdiff
path: root/include/block
AgeCommit message (Collapse)Author
2018-09-25block: Use a single global AioWaitKevin Wolf
When draining a block node, we recurse to its parent and for subtree drains also to its children. A single AIO_WAIT_WHILE() is then used to wait for bdrv_drain_poll() to become true, which depends on all of the nodes we recursed to. However, if the respective child or parent becomes quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent is checked, while AIO_WAIT_WHILE() depends on the AioWait of the original node. Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE(). This may mean that the draining thread gets a few more unnecessary wakeups because an unrelated operation got completed, but we already wake it up when something _could_ have changed rather than only if it has certainly changed. Apart from that, drain is a slow path anyway. In theory it would be possible to use wakeups more selectively and still correctly, but the gains are likely not worth the additional complexity. In fact, this patch is a nice simplification for some places in the code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25aio-wait: Increase num_waiters even in home threadKevin Wolf
Even if AIO_WAIT_WHILE() is called in the home context of the AioContext, we still want to allow the condition to change depending on other threads as long as they kick the AioWait. Specfically block jobs can be running in an I/O thread and should then be able to kick a drain in the main loop context. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>
2018-09-25blockjob: Wake up BDS when job becomes idleKevin Wolf
In the context of draining a BDS, the .drained_poll callback of block jobs is called. If this returns true (i.e. there is still some activity pending), the drain operation may call aio_poll() with blocking=true to wait for completion. As soon as the pending activity is completed and the job finally arrives in a quiescent state (i.e. its coroutine either yields with busy=false or terminates), the block job must notify the aio_poll() loop to wake up, otherwise we get a deadlock if both are running in different threads. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25block/stream: add block job creation flagsJohn Snow
Add support for taking and passing forward job creation flags. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20180906130225.5118-4-jsnow@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-09-25block/mirror: add block job creation flagsJohn Snow
Add support for taking and passing forward job creation flags. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20180906130225.5118-3-jsnow@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-09-25block/commit: add block job creation flagsJohn Snow
Add support for taking and passing forward job creation flags. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20180906130225.5118-2-jsnow@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-07-30block: Fix documentation for BDRV_REQ_MAY_UNMAPKevin Wolf
BDRV_REQ_MAY_UNMAP in a write_zeroes request does not only allow the driver to unmap the blocks, but it actively requests that the blocks be unmapped afterwards if at all possible. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10block: Use uint64_t for BdrvTrackedRequest byte fieldsFam Zheng
This matches the types used for bytes in the rest parts of block layer. In the case of bdrv_co_truncate, new_bytes can be the image size which probably doesn't fit in a 32 bit int. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10block: Use BdrvChild to discardFam Zheng
Other I/O functions are already using a BdrvChild pointer in the API, so make discard do the same. It makes it possible to initiate the same permission checks before doing I/O, and much easier to share the helper functions for this, which will be added and used by write, truncate and copy range paths. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10block: add BDRV_REQ_SERIALISING flagVladimir Sementsov-Ogievskiy
Serialized writes should be used in copy-on-write of backup(sync=none) for image fleecing scheme. We need to change an assert in bdrv_aligned_pwritev, added in 28de2dcd88de. The assert may fail now, because call to wait_serialising_requests here may become first call to it for this request with serializing flag set. It occurs if the request is aligned (otherwise, we should already set serializing flag before calling bdrv_aligned_pwritev and correspondingly waited for all intersecting requests). However, for aligned requests, we should not care about outdating of previously read data, as there no such data. Therefore, let's just update an assert to not care about aligned requests. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10block: split flags in copy_rangeVladimir Sementsov-Ogievskiy
Pass read flags and write flags separately. This is needed to handle coming BDRV_REQ_NO_SERIALISING clearly in following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10block/io: fix copy_rangeVladimir Sementsov-Ogievskiy
Here two things are fixed: 1. Architecture On each recursion step, we go to the child of src or dst, only for one of them. So, it's wrong to create tracked requests for both on each step. It leads to tracked requests duplication. 2. Wait for serializing requests on write path independently of BDRV_REQ_NO_SERIALISING Before commit 9ded4a01149 "backup: Use copy offloading", BDRV_REQ_NO_SERIALISING was used for only one case: read in copy-on-write operation during backup. Also, the flag was handled only on read path (in bdrv_co_preadv and bdrv_aligned_preadv). After 9ded4a01149, flag is used for not waiting serializing operations on backup target (in same case of copy-on-write operation). This behavior change is unsubstantiated and potentially dangerous, let's drop it and add additional asserts and documentation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10block: Poll after drain on attaching a nodeKevin Wolf
Commit dcf94a23b1 ('block: Don't poll in parent drain callbacks') removed polling in bdrv_child_cb_drained_begin() on the grounds that the original bdrv_drain() already will poll and BdrvChildRole.drained_begin calls must not cause graph changes (and therefore must not call aio_poll() or the recursion through the graph will break. This reasoning is correct for calls through bdrv_do_drained_begin(). However, BdrvChildRole.drained_begin is also called when a node that is already in a drained section (i.e. bdrv_do_drained_begin() has already returned and therefore can't poll any more) is attached to a new parent. In this case, we must explicitly poll to have all requests completed before the drained new child can be attached to the parent. In bdrv_replace_child_noperm(), we know that we're not inside the recursion of bdrv_do_drained_begin() because graph changes are not allowed there, and bdrv_replace_child_noperm() is a graph change. The call of BdrvChildRole.drained_begin() must therefore be followed by a BDRV_POLL_WHILE() that waits for the completion of requests. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-05Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches: - qcow2: Use worker threads for compression to improve performance of 'qemu-img convert -W' and compressed backup jobs - blklogwrites: New filter driver to log write requests to an image in the dm-log-writes format - file-posix: Fix image locking during image creation - crypto: Fix memory leak in error path - Error out instead of silently truncating node names # gpg: Signature made Thu 05 Jul 2018 11:24:33 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: file-posix: Unlock FD after creation file-posix: Fix creation locking block/blklogwrites: Add an option for the update interval of the log superblock block/blklogwrites: Add an option for appending to an old log block/blklogwrites: Change log_sector_size from int64_t to uint64_t block/crypto: Fix memory leak in create error path block: Don't silently truncate node names block: Add blklogwrites block: Move two block permission constants to the relevant enum qcow2: add compress threads qcow2: refactor data compression qemu-img: allow compressed not-in-order writes Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-07-05block: Move two block permission constants to the relevant enumAri Sundholm
This allows using the two constants outside of block.c, which will happen in a subsequent patch. Signed-off-by: Ari Sundholm <ari@tuxera.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-04block/dirty-bitmap: add bdrv_enable_dirty_bitmap_lockedVladimir Sementsov-Ogievskiy
Add _locked version of bdrv_enable_dirty_bitmap, to fix dirty bitmap migration in the following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180625165745.25259-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
2018-07-03Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into stagingPeter Maydell
# gpg: Signature made Tue 03 Jul 2018 04:42:11 BST # gpg: using RSA key BDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: backup: Use copy offloading block: Honour BDRV_REQ_NO_SERIALISING in copy range block: Fix parameter checking in bdrv_co_copy_range_internal Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-07-02block: Honour BDRV_REQ_NO_SERIALISING in copy rangeFam Zheng
This semantics is needed by drive-backup so implement it before using this API there. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180703023758.14422-3-famz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-07-02nbd/client: Add x-dirty-bitmap to query bitmap from serverEric Blake
In order to test that the NBD server is properly advertising dirty bitmaps, we need a bare minimum client that can request and read the context. Since feature freeze for 3.0 is imminent, this is the smallest workable patch, which replaces the qemu block status report with the results of the NBD server's dirty bitmap (making it very easy to use 'qemu-img map --output=json' to learn where the dirty portions are). Note that the NBD protocol defines a dirty section with the same bit but opposite sense that normal "base:allocation" uses to report an allocated section; so in qemu-img map output, "data":true corresponds to clean, "data":false corresponds to dirty. A more complete solution that allows dirty bitmaps to be queried at the same time as normal block status will be required before this addition can lose the x- prefix. Until then, the fact that this replaces normal status with dirty status means actions like 'qemu-img convert' will likely misbehave due to treating dirty regions of the file as if they are unallocated. The next patch adds an iotest to exercise this new code. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180702191458.28741-2-eblake@redhat.com>
2018-06-29block: Remove unused sector-based vectored I/OEric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. Now that all callers of vectored I/O have been converted to use our preferred byte-based bdrv_co_p{read,write}v(), we can delete the unused bdrv_co_{read,write}v(). Furthermore, this gets rid of the signature difference between the public bdrv_co_writev() and the callback .bdrv_co_writev (the latter still exists, because some drivers still need more work before they are fully byte-based). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29file-posix: Make .bdrv_co_truncate asynchronousKevin Wolf
This moves the code to resize an image file to the thread pool to avoid blocking. Creating large images with preallocation with blockdev-create is now actually a background job instead of blocking the monitor (and most other things) until the preallocation has completed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29block: Use tracked request for truncateKevin Wolf
When growing an image, block drivers (especially protocol drivers) may initialise the newly added area. I/O requests to the same area need to wait for this initialisation to be completed so that data writes don't get overwritten and reads don't read uninitialised data. To avoid overhead in the fast I/O path by adding new locking in the protocol drivers and to restrict the impact to requests that actually touch the new area, reuse the existing tracked request infrastructure in block/io.c and mark all discard requests as serialising. With this change, it is safe for protocol drivers to make .bdrv_co_truncate actually asynchronous. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29block: Move bdrv_truncate() implementation to io.cKevin Wolf
This moves the bdrv_truncate() implementation from block.c to block/io.c so it can have access to the tracked requests infrastructure. This involves making refresh_total_sectors() public (in block_int.h). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29block: Convert .bdrv_truncate callback to coroutine_fnKevin Wolf
bdrv_truncate() is an operation that can block (even for a quite long time, depending on the PreallocMode) in I/O paths that shouldn't block. Convert it to a coroutine_fn so that we have the infrastructure for drivers to make their .bdrv_co_truncate implementation asynchronous. This change could potentially introduce new race conditions because bdrv_truncate() isn't necessarily executed atomically any more. Whether this is a problem needs to be evaluated for each block driver that supports truncate: * file-posix/win32, gluster, iscsi, nfs, rbd, ssh, sheepdog: The protocol drivers are trivially safe because they don't actually yield yet, so there is no change in behaviour. * copy-on-read, crypto, raw-format: Essentially just filter drivers that pass the request to a child node, no problem. * qcow2: The implementation modifies metadata, so it needs to hold s->lock to be safe with concurrent I/O requests. In order to avoid double locking, this requires pulling the locking out into preallocate_co() and using qcow2_write_caches() instead of bdrv_flush(). * qed: Does a single header update, this is fine without locking. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-27linux-aio: properly bubble up errors from initializationNishanth Aravamudan
laio_init() can fail for a couple of reasons, which will lead to a NULL pointer dereference in laio_attach_aio_context(). To solve this, add a aio_setup_linux_aio() function which is called early in raw_open_common. If this fails, propagate the error up. The signature of aio_get_linux_aio() was not modified, because it seems preferable to return the actual errno from the possible failing initialization calls. Additionally, when the AioContext changes, we need to associate a LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context callback and call the new aio_setup_linux_aio(), which will allocate a new AioContext if needed, and return errors on failures. If it fails for any reason, fallback to threaded AIO with an error message, as the device is already in-use by the guest. Add an assert that aio_get_linux_aio() cannot return NULL. Signed-off-by: Nishanth Aravamudan <naravamudan@digitalocean.com> Message-id: 20180622193700.6523-1-naravamudan@digitalocean.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-21nbd/server: introduce NBD_CMD_CACHEVladimir Sementsov-Ogievskiy
Handle nbd CACHE command. Just do read, without sending read data back. Cache mechanism should be done by exported node driver chain. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180413143156.11409-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix two missing case labels in switch statements] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-06-21nbd/server: implement dirty bitmap exportVladimir Sementsov-Ogievskiy
Handle a new NBD meta namespace: "qemu", and corresponding queries: "qemu:dirty-bitmap:<export bitmap name>". With the new metadata context negotiated, BLOCK_STATUS query will reply with dirty-bitmap data, converted to extents. The new public function nbd_export_bitmap selects which bitmap to export. For now, only one bitmap may be exported. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180609151758.17343-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: wording tweaks, minor cleanups, additional tracing] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-06-18block/mirror: Add copy mode QAPI interfaceMax Reitz
This patch allows the user to specify whether to use active or only background mode for mirror block jobs. Currently, this setting will remain constant for the duration of the entire block job. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-14-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18block/dirty-bitmap: Add bdrv_dirty_iter_next_areaMax Reitz
This new function allows to look for a consecutively dirty area in a dirty bitmap. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180613181823.13618-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18block: Allow graph changes in bdrv_drain_all_begin/end sectionsKevin Wolf
bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and did a subtree drain for each of them. This works fine as long as the graph is static, but sadly, reality looks different. If the graph changes so that root nodes are added or removed, we would have to compensate for this. bdrv_next() returns each root node only once even if it's the root node for multiple BlockBackends or for a monitor-owned block driver tree, which would only complicate things. The much easier and more obviously correct way is to fundamentally change the way the functions work: Iterate over all BlockDriverStates, no matter who owns them, and drain them individually. Compensation is only necessary when a new BDS is created inside a drain_all section. Removal of a BDS doesn't require any action because it's gone afterwards anyway. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: ignore_bds_parents parameter for drain functionsKevin Wolf
In the future, bdrv_drained_all_begin/end() will drain all invidiual nodes separately rather than whole subtrees. This means that we don't want to propagate the drain to all parents any more: If the parent is a BDS, it will already be drained separately. Recursing to all parents is unnecessary work and would make it an O(n²) operation. Prepare the drain function for the changed drain_all by adding an ignore_bds_parents parameter to the internal implementation that prevents the propagation of the drain to BDS parents. We still (have to) propagate it to non-BDS parents like BlockBackends or Jobs because those are not drained separately. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Allow AIO_WAIT_WHILE with NULL ctxKevin Wolf
bdrv_drain_all() wants to have a single polling loop for draining the in-flight requests of all nodes. This means that the AIO_WAIT_WHILE() condition relies on activity in multiple AioContexts, which is polled from the mainloop context. We must therefore call AIO_WAIT_WHILE() from the mainloop thread and use the AioWait notification mechanism. Just randomly picking the AioContext of any non-mainloop thread would work, but instead of bothering to find such a context in the caller, we can just as well accept NULL for ctx. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Don't poll in parent drain callbacksKevin Wolf
bdrv_do_drained_begin() is only safe if we have a single BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow that parent callbacks introduce a nested polling loop that could cause graph changes while we're traversing the graph. Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single node without waiting for its requests to complete. These requests will be waited for in the BDRV_POLL_WHILE() call down the call chain. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Drain recursively with a single BDRV_POLL_WHILE()Kevin Wolf
Anything can happen inside BDRV_POLL_WHILE(), including graph changes that may interfere with its callers (e.g. child list iteration in recursive callers of bdrv_do_drained_begin). Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the end of bdrv_do_drained_begin() to avoid such effects. The recursion happens now inside the loop condition. As the graph can only change between bdrv_drain_poll() calls, but not inside of it, doing the recursion here is safe. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Really pause block jobs on drainKevin Wolf
We already requested that block jobs be paused in .bdrv_drained_begin, but no guarantee was made that the job was actually inactive at the point where bdrv_drained_begin() returned. This introduces a new callback BdrvChildRole.bdrv_drained_poll() and uses it to make bdrv_drain_poll() consider block jobs using the node to be drained. For the test case to work as expected, we have to switch from block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even considered active and must be waited for when draining the node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()Kevin Wolf
Commit 91af091f923 added an additional aio_poll() to BDRV_POLL_WHILE() in order to make sure that all pending BHs are executed on drain. This was the wrong place to make the fix, as it is useless overhead for all other users of the macro and unnecessarily complicates the mechanism. This patch effectively reverts said commit (the context has changed a bit and the code has moved to AIO_WAIT_WHILE()) and instead polls in the loop condition for drain. The effect is probably hard to measure in any real-world use case because actual I/O will dominate, but if I run only the initialisation part of 'qemu-img convert' where it calls bdrv_block_status() for the whole image to find out how much data there is copy, this phase actually needs only roughly half the time after this patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-15block: Factor out qobject_input_visitor_new_flat_confused()Markus Armbruster
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15block: Fix -blockdev for certain non-string scalarsMarkus Armbruster
Configuration flows through the block subsystem in a rather peculiar way. Configuration made with -drive enters it as QemuOpts. Configuration made with -blockdev / blockdev-add enters it as QAPI type BlockdevOptions. The block subsystem uses QDict, QemuOpts and QAPI types internally. The precise flow is next to impossible to explain (I tried for this commit message, but gave up after wasting several hours). What I can explain is a flaw in the BlockDriver interface that leads to this bug: $ qemu-system-x86_64 -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234 qemu-system-x86_64: -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234: Internal error: parameter user invalid QMP blockdev-add is broken the same way. Here's what happens. The block layer passes configuration represented as flat QDict (with dotted keys) to BlockDriver methods .bdrv_file_open(). The QDict's members are typed according to the QAPI schema. nfs_file_open() converts it to QAPI type BlockdevOptionsNfs, with qdict_crumple() and a qobject input visitor. This visitor comes in two flavors. The plain flavor requires scalars to be typed according to the QAPI schema. That's the case here. The keyval flavor requires string scalars. That's not the case here. nfs_file_open() uses the latter, and promptly falls apart for members @user, @group, @tcp-syn-count, @readahead-size, @page-cache-size, @debug. Switching to the plain flavor would fix -blockdev, but break -drive, because there the scalars arrive in nfs_file_open() as strings. The proper fix would be to replace the QDict by QAPI type BlockdevOptions in the BlockDriver interface. Sadly, that's beyond my reach right now. Next best would be to fix the block layer to always pass correctly typed QDicts to the BlockDriver methods. Also beyond my reach. What I can do is throw another hack onto the pile: have nfs_file_open() convert all members to string, so use of the keyval flavor actually works, by replacing qdict_crumple() by new function qdict_crumple_for_keyval_qiv(). The pattern "pass result of qdict_crumple() to qobject_input_visitor_new_keyval()" occurs several times more: * qemu_rbd_open() Same issue as nfs_file_open(), but since BlockdevOptionsRbd has only string members, its only a latent bug. Fix it anyway. * parallels_co_create_opts(), qcow_co_create_opts(), qcow2_co_create_opts(), bdrv_qed_co_create_opts(), sd_co_create_opts(), vhdx_co_create_opts(), vpc_co_create_opts() These work, because they create the QDict with qemu_opts_to_qdict_filtered(), which creates only string scalars. The function sports a TODO comment asking for better typing; that's going to be fun. Use qdict_crumple_for_keyval_qiv() to be safe. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15block: Add block-specific QDict headerMax Reitz
There are numerous QDict functions that have been introduced for and are used only by the block layer. Move their declarations into an own header file to reflect that. While qdict_extract_subqdict() is in fact used outside of the block layer (in util/qemu-config.c), it is still a function related very closely to how the block layer works with nested QDicts, namely by sometimes flattening them. Therefore, its declaration is put into this header as well and util/qemu-config.c includes it with a comment stating exactly which function it needs. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20180509165530.29561-7-mreitz@redhat.com> [Copyright note tweaked, superfluous includes dropped] Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-11qapi: add x-block-dirty-bitmap-mergeVladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20180606182449.1607-5-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2018-06-11block: remove bdrv_dirty_bitmap_make_anonPaolo Bonzini
All this function is doing will be repeated by bdrv_do_release_matching_dirty_bitmap_locked, except resetting bm->persistent. But even that does not matter because the bitmap will be freed. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180323164254.26487-1-pbonzini@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2018-06-11block: Make bdrv_is_writable() publicMax Reitz
This is a useful function for the whole block layer, so make it public. At the same time, users outside of block.c probably do not need to make use of the reopen functionality, so rename the current function to bdrv_is_writable_after_reopen() create a new bdrv_is_writable() function that just passes NULL to it for the reopen queue. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180606193702.7113-2-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11block: Add Error parameter to bdrv_amend_optionsMax Reitz
Looking at the qcow2 code that is riddled with error_report() calls, this is really how it should have been from the start. Along the way, turn the target_version/current_version comparisons at the beginning of qcow2_downgrade() into assertions (the caller has to make sure these conditions are met), and rephrase the error message on using compat=1.1 to get refcount widths other than 16 bits. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180509210023.20283-3-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-01file-posix: Implement bdrv_co_copy_rangeFam Zheng
With copy_file_range(2), we can implement the bdrv_co_copy_range semantics. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180601092648.24614-6-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-01block: Introduce API for copy offloadingFam Zheng
Introduce the bdrv_co_copy_range() API for copy offloading. Block drivers implementing this API support efficient copy operations that avoid reading each block from the source device and writing it to the destination devices. Examples of copy offload primitives are SCSI EXTENDED COPY and Linux copy_file_range(2). Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180601092648.24614-2-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-05-23blockjob: Remove BlockJob.driverKevin Wolf
BlockJob.driver is redundant with Job.driver and only used in very few places any more. Remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-05-23job: Move progress fields to JobKevin Wolf
BlockJob has fields .offset and .len, which are actually misnomers today because they are no longer tied to block device sizes, but just progress counters. As such they make a lot of sense in generic Jobs. This patch moves the fields to Job and renames them to .progress_current and .progress_total to describe their function better. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23job: Add job_transition_to_ready()Kevin Wolf
The transition to the READY state was still performed in the BlockJob layer, in the same function that sent the BLOCK_JOB_READY QMP event. This patch brings the state transition to the Job layer and implements the QMP event using a notifier called from the Job layer, like we already do for other events related to state transitions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23job: Add job_is_ready()Kevin Wolf
Instead of having a 'bool ready' in BlockJob, add a function that derives its value from the job status. At the same time, this fixes the behaviour to match what the QAPI documentation promises for query-block-job: 'true if the job may be completed'. When the ready flag was introduced in commit ef6dbf1e46e, the flag never had to be reset to match the description because after being ready, the jobs would immediately complete and disappear. Job transactions and manual job finalisation were introduced only later. With these changes, jobs may stay around even after having completed (and they are not ready to be completed a second time), however their patches forgot to reset the ready flag. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-05-23job: Add job_dismiss()Kevin Wolf
This moves block_job_dismiss() to the Job layer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>