aboutsummaryrefslogtreecommitdiff
path: root/include/block/block.h
AgeCommit message (Collapse)Author
2018-07-10block: Use BdrvChild to discardFam Zheng
Other I/O functions are already using a BdrvChild pointer in the API, so make discard do the same. It makes it possible to initiate the same permission checks before doing I/O, and much easier to share the helper functions for this, which will be added and used by write, truncate and copy range paths. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10block: add BDRV_REQ_SERIALISING flagVladimir Sementsov-Ogievskiy
Serialized writes should be used in copy-on-write of backup(sync=none) for image fleecing scheme. We need to change an assert in bdrv_aligned_pwritev, added in 28de2dcd88de. The assert may fail now, because call to wait_serialising_requests here may become first call to it for this request with serializing flag set. It occurs if the request is aligned (otherwise, we should already set serializing flag before calling bdrv_aligned_pwritev and correspondingly waited for all intersecting requests). However, for aligned requests, we should not care about outdating of previously read data, as there no such data. Therefore, let's just update an assert to not care about aligned requests. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10block: split flags in copy_rangeVladimir Sementsov-Ogievskiy
Pass read flags and write flags separately. This is needed to handle coming BDRV_REQ_NO_SERIALISING clearly in following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10block/io: fix copy_rangeVladimir Sementsov-Ogievskiy
Here two things are fixed: 1. Architecture On each recursion step, we go to the child of src or dst, only for one of them. So, it's wrong to create tracked requests for both on each step. It leads to tracked requests duplication. 2. Wait for serializing requests on write path independently of BDRV_REQ_NO_SERIALISING Before commit 9ded4a01149 "backup: Use copy offloading", BDRV_REQ_NO_SERIALISING was used for only one case: read in copy-on-write operation during backup. Also, the flag was handled only on read path (in bdrv_co_preadv and bdrv_aligned_preadv). After 9ded4a01149, flag is used for not waiting serializing operations on backup target (in same case of copy-on-write operation). This behavior change is unsubstantiated and potentially dangerous, let's drop it and add additional asserts and documentation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10block: Poll after drain on attaching a nodeKevin Wolf
Commit dcf94a23b1 ('block: Don't poll in parent drain callbacks') removed polling in bdrv_child_cb_drained_begin() on the grounds that the original bdrv_drain() already will poll and BdrvChildRole.drained_begin calls must not cause graph changes (and therefore must not call aio_poll() or the recursion through the graph will break. This reasoning is correct for calls through bdrv_do_drained_begin(). However, BdrvChildRole.drained_begin is also called when a node that is already in a drained section (i.e. bdrv_do_drained_begin() has already returned and therefore can't poll any more) is attached to a new parent. In this case, we must explicitly poll to have all requests completed before the drained new child can be attached to the parent. In bdrv_replace_child_noperm(), we know that we're not inside the recursion of bdrv_do_drained_begin() because graph changes are not allowed there, and bdrv_replace_child_noperm() is a graph change. The call of BdrvChildRole.drained_begin() must therefore be followed by a BDRV_POLL_WHILE() that waits for the completion of requests. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-05block: Move two block permission constants to the relevant enumAri Sundholm
This allows using the two constants outside of block.c, which will happen in a subsequent patch. Signed-off-by: Ari Sundholm <ari@tuxera.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-02block: Honour BDRV_REQ_NO_SERIALISING in copy rangeFam Zheng
This semantics is needed by drive-backup so implement it before using this API there. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180703023758.14422-3-famz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2018-06-29block: Remove unused sector-based vectored I/OEric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. Now that all callers of vectored I/O have been converted to use our preferred byte-based bdrv_co_p{read,write}v(), we can delete the unused bdrv_co_{read,write}v(). Furthermore, this gets rid of the signature difference between the public bdrv_co_writev() and the callback .bdrv_co_writev (the latter still exists, because some drivers still need more work before they are fully byte-based). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29block: Convert .bdrv_truncate callback to coroutine_fnKevin Wolf
bdrv_truncate() is an operation that can block (even for a quite long time, depending on the PreallocMode) in I/O paths that shouldn't block. Convert it to a coroutine_fn so that we have the infrastructure for drivers to make their .bdrv_co_truncate implementation asynchronous. This change could potentially introduce new race conditions because bdrv_truncate() isn't necessarily executed atomically any more. Whether this is a problem needs to be evaluated for each block driver that supports truncate: * file-posix/win32, gluster, iscsi, nfs, rbd, ssh, sheepdog: The protocol drivers are trivially safe because they don't actually yield yet, so there is no change in behaviour. * copy-on-read, crypto, raw-format: Essentially just filter drivers that pass the request to a child node, no problem. * qcow2: The implementation modifies metadata, so it needs to hold s->lock to be safe with concurrent I/O requests. In order to avoid double locking, this requires pulling the locking out into preallocate_co() and using qcow2_write_caches() instead of bdrv_flush(). * qed: Does a single header update, this is fine without locking. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18block: Allow graph changes in bdrv_drain_all_begin/end sectionsKevin Wolf
bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and did a subtree drain for each of them. This works fine as long as the graph is static, but sadly, reality looks different. If the graph changes so that root nodes are added or removed, we would have to compensate for this. bdrv_next() returns each root node only once even if it's the root node for multiple BlockBackends or for a monitor-owned block driver tree, which would only complicate things. The much easier and more obviously correct way is to fundamentally change the way the functions work: Iterate over all BlockDriverStates, no matter who owns them, and drain them individually. Compensation is only necessary when a new BDS is created inside a drain_all section. Removal of a BDS doesn't require any action because it's gone afterwards anyway. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: ignore_bds_parents parameter for drain functionsKevin Wolf
In the future, bdrv_drained_all_begin/end() will drain all invidiual nodes separately rather than whole subtrees. This means that we don't want to propagate the drain to all parents any more: If the parent is a BDS, it will already be drained separately. Recursing to all parents is unnecessary work and would make it an O(n²) operation. Prepare the drain function for the changed drain_all by adding an ignore_bds_parents parameter to the internal implementation that prevents the propagation of the drain to BDS parents. We still (have to) propagate it to non-BDS parents like BlockBackends or Jobs because those are not drained separately. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Don't poll in parent drain callbacksKevin Wolf
bdrv_do_drained_begin() is only safe if we have a single BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow that parent callbacks introduce a nested polling loop that could cause graph changes while we're traversing the graph. Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single node without waiting for its requests to complete. These requests will be waited for in the BDRV_POLL_WHILE() call down the call chain. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Drain recursively with a single BDRV_POLL_WHILE()Kevin Wolf
Anything can happen inside BDRV_POLL_WHILE(), including graph changes that may interfere with its callers (e.g. child list iteration in recursive callers of bdrv_do_drained_begin). Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the end of bdrv_do_drained_begin() to avoid such effects. The recursion happens now inside the loop condition. As the graph can only change between bdrv_drain_poll() calls, but not inside of it, doing the recursion here is safe. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Really pause block jobs on drainKevin Wolf
We already requested that block jobs be paused in .bdrv_drained_begin, but no guarantee was made that the job was actually inactive at the point where bdrv_drained_begin() returned. This introduces a new callback BdrvChildRole.bdrv_drained_poll() and uses it to make bdrv_drain_poll() consider block jobs using the node to be drained. For the test case to work as expected, we have to switch from block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even considered active and must be waited for when draining the node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-11block: Make bdrv_is_writable() publicMax Reitz
This is a useful function for the whole block layer, so make it public. At the same time, users outside of block.c probably do not need to make use of the reopen functionality, so rename the current function to bdrv_is_writable_after_reopen() create a new bdrv_is_writable() function that just passes NULL to it for the reopen queue. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180606193702.7113-2-mreitz@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11block: Add Error parameter to bdrv_amend_optionsMax Reitz
Looking at the qcow2 code that is riddled with error_report() calls, this is really how it should have been from the start. Along the way, turn the target_version/current_version comparisons at the beginning of qcow2_downgrade() into assertions (the caller has to make sure these conditions are met), and rephrase the error message on using compat=1.1 to get refcount widths other than 16 bits. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180509210023.20283-3-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-01block: Introduce API for copy offloadingFam Zheng
Introduce the bdrv_co_copy_range() API for copy offloading. Block drivers implementing this API support efficient copy operations that avoid reading each block from the source device and writing it to the destination devices. Examples of copy offload primitives are SCSI EXTENDED COPY and Linux copy_file_range(2). Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180601092648.24614-2-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-05-15block: Add BDRV_REQ_WRITE_UNCHANGED flagMax Reitz
This flag signifies that a write request will not change the visible disk content. With this flag set, it is sufficient to have the BLK_PERM_WRITE_UNCHANGED permission instead of BLK_PERM_WRITE. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180421132929.21610-4-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-05-15block: BLK_PERM_WRITE includes ..._UNCHANGEDMax Reitz
Currently we never actually check whether the WRITE_UNCHANGED permission has been taken for unchanging writes. But the one check that is commented out checks both WRITE and WRITE_UNCHANGED; and considering that WRITE_UNCHANGED is already documented as being weaker than WRITE, we should probably explicitly document WRITE to include WRITE_UNCHANGED. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180421132929.21610-3-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-03-09block: Make bdrv_is_whitelisted() publicKevin Wolf
We'll use a separate source file for image creation, and we need to check there whether the requested driver is whitelisted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09qcow2: Use BlockdevRef in qcow2_co_create()Kevin Wolf
Instead of passing a separate BlockDriverState* into qcow2_co_create(), make use of the BlockdevRef that is included in BlockdevCreateOptions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-06Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches # gpg: Signature made Mon 05 Mar 2018 17:45:51 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (38 commits) block: Fix NULL dereference on empty drive error qcow2: Replace align_offset() with ROUND_UP() block/ssh: Add basic .bdrv_truncate() block/ssh: Make ssh_grow_file() blocking block/ssh: Pull ssh_grow_file() from ssh_create() qemu-img: Make resize error message more general qcow2: make qcow2_co_create2() a coroutine_fn block: rename .bdrv_create() to .bdrv_co_create_opts() Revert "IDE: Do not flush empty CDROM drives" block: test blk_aio_flush() with blk->root == NULL block: add BlockBackend->in_flight counter block: extract AIO_WAIT_WHILE() from BlockDriverState aio: rename aio_context_in_iothread() to in_aio_context_home_thread() docs: document how to use the l2-cache-entry-size parameter specs/qcow2: Fix documentation of the compressed cluster descriptor iotest 033: add misaligned write-zeroes test via truncate block: fix write with zero flag set and iovector provided block: Drop unused .bdrv_co_get_block_status() vvfat: Switch to .bdrv_co_block_status() vpc: Switch to .bdrv_co_block_status() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # include/block/block.h
2018-03-02Include less of the generated modular QAPI headersMarkus Armbruster
In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-02block: extract AIO_WAIT_WHILE() from BlockDriverStateStefan Hajnoczi
BlockDriverState has the BDRV_POLL_WHILE() macro to wait on event loop activity while a condition evaluates to true. This is used to implement synchronous operations where it acts as a condvar between the IOThread running the operation and the main loop waiting for the operation. It can also be called from the thread that owns the AioContext and in that case it's just a nested event loop. BlockBackend needs this behavior but doesn't always have a BlockDriverState it can use. This patch extracts BDRV_POLL_WHILE() into the AioWait abstraction, which can be used with AioContext and isn't tied to BlockDriverState anymore. This feature could be built directly into AioContext but then all users would kick the event loop even if they signal different conditions. Imagine an AioContext with many BlockDriverStates, each time a request completes any waiter would wake up and re-check their condition. It's nicer to keep a separate AioWait object for each condition instead. Please see "block/aio-wait.h" for details on the API. The name AIO_WAIT_WHILE() avoids the confusion between AIO_POLL_WHILE() and AioContext polling. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02aio: rename aio_context_in_iothread() to in_aio_context_home_thread()Stefan Hajnoczi
The name aio_context_in_iothread() is misleading because it also returns true when called on the main AioContext from the main loop thread, which is not an IOThread. This patch renames it to in_aio_context_home_thread() and expands the doc comment to make the semantics clearer. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02block: Add .bdrv_co_block_status() callbackEric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. Now that the block layer exposes byte-based allocation, it's time to tackle the drivers. Add a new callback that operates on as small as byte boundaries. Subsequent patches will then update individual drivers, then finally remove .bdrv_co_get_block_status(). The new code also passes through the 'want_zero' hint, which will allow subsequent patches to further optimize callers that only care about how much of the image is allocated (want_zero is false), rather than full details about runs of zeroes and which offsets the allocation actually maps to (want_zero is true). As part of this effort, fix another part of the documentation: the claim in commit 4c41cb4 that BDRV_BLOCK_ALLOCATED is short for 'DATA || ZERO' is a lie at the block layer (see commit e88ae2264), even though it is how the bit is computed from the driver layer. After all, there are intentionally cases where we return ZERO but not ALLOCATED at the block layer, when we know that a read sees zero because the backing file is too short. Note that the driver interface is thus slightly different than the public interface with regards to which bits will be set, and what guarantees are provided on input. We also add an assertion that any driver using the new callback will make progress (the only time pnum will be 0 is if the block layer already handled an out-of-bounds request, or if there is an error); the old driver interface did not provide this guarantee, which could lead to some inf-loops in drastic corner-case failures. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-02-09block: Simplify bdrv_can_write_zeroes_with_unmap()Eric Blake
We don't need the can_write_zeroes_with_unmap field in BlockDriverInfo, because it is redundant information with supported_zero_flags & BDRV_REQ_MAY_UNMAP. Note that BlockDriverInfo and supported_zero_flags are both per-device settings, rather than global state about the driver as a whole, which means one or both of these bits of information can already be conditional. Let's audit how they were set: crypto: always setting can_write_ to false is pointless (the struct starts life zero-initialized), no use of supported_ nbd: just recently fixed to set can_write_ if supported_ includes MAY_UNMAP (thus this commit effectively reverts bca80059e and solves the problem mentioned there in a more global way) file-posix, iscsi, qcow2: can_write_ is conditional, while supported_ was unconditional; but passing MAY_UNMAP would fail with ENOTSUP if the condition wasn't met qed: can_write_ is unconditional, but pwrite_zeroes lacks support for MAY_UNMAP and supported_ is not set. Perhaps support can be added later (since it would be similar to qcow2), but for now claiming false is no real loss all other drivers: can_write_ is not set, and supported_ is either unset or a passthrough Simplify the code by moving the conditional into supported_zero_flags for all drivers, then dropping the now-unused BDI field. For callers that relied on bdrv_can_write_zeroes_with_unmap(), we return the same per-device settings for drivers that had conditions (no observable change in behavior there); and can now return true (instead of false) for drivers that support passthrough (for example, the commit driver) which gives those drivers the same fix as nbd just got in bca80059e. For callers that relied on supported_zero_flags, we now have a few more places that can avoid a wasted call to pwrite_zeroes() that will just fail with ENOTSUP. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180126193439.20219-1-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-02-09Move include qemu/option.h from qemu-common.h to actual usersMarkus Armbruster
qemu-common.h includes qemu/option.h, but most places that include the former don't actually need the latter. Drop the include, and add it to the places that actually need it. While there, drop superfluous includes of both headers, and separate #include from file comment with a blank line. This cleanup makes the number of objects depending on qemu/option.h drop from 4545 (out of 4743) to 284 in my "build everything" tree. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-20-armbru@redhat.com> [Semantic conflict with commit bdd6a90a9e in block/nvme.c resolved]
2018-02-09Include qapi/qmp/qdict.h exactly where neededMarkus Armbruster
This cleanup makes the number of objects depending on qapi/qmp/qdict.h drop from 4550 (out of 4743) to 368 in my "build everything" tree. For qapi/qmp/qobject.h, the number drops from 4552 to 390. While there, separate #include from file comment with a blank line. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-13-armbru@redhat.com>
2018-02-09Include qapi/qmp/qobject.h exactly where neededMarkus Armbruster
Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-11-armbru@redhat.com>
2018-02-09Drop superfluous includes of qapi-types.h and test-qapi-types.hMarkus Armbruster
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-4-armbru@redhat.com>
2018-02-08block: Introduce buf register APIFam Zheng
Allow block driver to map and unmap a buffer for later I/O, as a performance hint. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-5-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2017-12-22block: Allow graph changes in subtree drained sectionKevin Wolf
We need to remember how many of the drain sections in which a node is were recursive (i.e. subtree drain rather than node drain), so that they can be correctly applied when children are added or removed during the drained section. With this change, it is safe to modify the graph even inside a bdrv_subtree_drained_begin/end() section. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22block: Add bdrv_subtree_drained_begin/end()Kevin Wolf
bdrv_drained_begin() waits for the completion of requests in the whole subtree, but it only actually keeps its immediate bs parameter quiesced until bdrv_drained_end(). Add a version that keeps the whole subtree drained. As of this commit, graph changes cannot be allowed during a subtree drained section, but this will be fixed soon. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22block: Don't notify parents in drain call chainKevin Wolf
This is in preparation for subtree drains, i.e. drained sections that affect not only a single node, but recursively all child nodes, too. Calling the parent callbacks for drain is pointless when we just came from that parent node recursively and leads to multiple increases of bs->quiesce_counter in a single drain call. Don't do it. In order for this to work correctly, the parent callback must be called for every bdrv_drain_begin/end() call, not only for the outermost one: If we have a node N with two parents A and B, recursive draining of A should cause the quiesce_counter of B to increase because its child N is drained independently of B. If now B is recursively drained, too, A must increase its quiesce_counter because N is drained independently of A only now, even if N is going from quiesce_counter 1 to 2. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-11-17block: Make bdrv_next() keep strong referencesMax Reitz
On one hand, it is a good idea for bdrv_next() to return a strong reference because ideally nearly every pointer should be refcounted. This fixes intermittent failure of iotest 194. On the other, it is absolutely necessary for bdrv_next() itself to keep a strong reference to both the BB (in its first phase) and the BDS (at least in the second phase) because when called the next time, it will dereference those objects to get a link to the next one. Therefore, it needs these objects to stay around until then. Just storing the pointer to the next in the iterator is not really viable because that pointer might become invalid as well. Both arguments taken together means we should probably just invoke bdrv_ref() and blk_ref() in bdrv_next(). This means we have to assert that bdrv_next() is always called from the main loop, but that was probably necessary already before this patch and judging from the callers, it also looks to actually be the case. Keeping these strong references means however that callers need to give them up if they decide to abort the iteration early. They can do so through the new bdrv_next_cleanup() function. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110172545.32609-1-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-26block: Convert bdrv_get_block_status_above() to bytesEric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the name of the function from bdrv_get_block_status_above() to bdrv_block_status_above() ensures that the compiler enforces that all callers are updated. Likewise, since it a byte interface allows an offset mapping that might not be sector aligned, split the mapping out of the return value and into a pass-by-reference parameter. For now, the io.c layer still assert()s that all uses are sector-aligned, but that can be relaxed when a later patch implements byte-based block status in the drivers. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_block_status(), plus updates for the new split return interface. But some code, particularly bdrv_block_status(), gets a lot simpler because it no longer has to mess with sectors. Likewise, mirror code no longer computes s->granularity >> BDRV_SECTOR_BITS, and can therefore drop an assertion about alignment because the loop no longer depends on alignment (never mind that we don't really have a driver that reports sub-sector alignments, so it's not really possible to test the effect of sub-sector mirroring). Fix a neighboring assertion to use is_power_of_2 while there. For ease of review, bdrv_get_block_status() was tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26block: Convert bdrv_get_block_status() to bytesEric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the name of the function from bdrv_get_block_status() to bdrv_block_status() ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned, but that can be relaxed when a later patch implements byte-based block status in the drivers. There was an inherent limitation in returning the offset via the return value: we only have room for BDRV_BLOCK_OFFSET_MASK bits, which means an offset can only be mapped for sector-aligned queries (or, if we declare that non-aligned input is at the same relative position modulo 512 of the answer), so the new interface also changes things to return the offset via output through a parameter by reference rather than mashed into the return value. We'll have some glue code that munges between the two styles until we finish converting all uses. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_block_status(), coupled with the tweak in calling convention. But some code, particularly bdrv_is_allocated(), gets a lot simpler because it no longer has to mess with sectors. For ease of review, bdrv_get_block_status_above() will be tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26block: Make bdrv_round_to_clusters() signature more usefulEric Blake
In the process of converting sector-based interfaces to bytes, I'm finding it easier to represent a byte count as a 64-bit integer at the block layer (even if we are internally capped by SIZE_MAX or even INT_MAX for individual transactions, it's still nicer to not have to worry about truncation/overflow issues on as many variables). Update the signature of bdrv_round_to_clusters() to uniformly use int64_t, matching the signature already chosen for bdrv_is_allocated and the fact that off_t is also a signed type, then adjust clients according to the required fallout (even where the result could now exceed 32 bits, no client is directly assigning the result into a 32-bit value without breaking things into a loop first). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06commit: Remove overlay_bsKevin Wolf
We don't need to make any assumptions about the graph layout above the top node of the commit operation any more. Remove the use of bdrv_find_overlay() and related variables from the commit job code. bdrv_drop_intermediate() doesn't use the 'active' parameter any more, so we can just drop it. The overlay node was previously added to the block job to get a BLK_PERM_GRAPH_MOD. We really need to respect those permissions in bdrv_drop_intermediate() now, but as long as we haven't figured out yet how BLK_PERM_GRAPH_MOD is actually supposed to work, just leave a TODO comment there. With this change, it is now possible to perform another block job on an overlay node without conflicts. qemu-iotests 030 is changed accordingly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-26block: Fix permissions after bdrv_reopen()Kevin Wolf
If we switch between read-only and read-write, the permissions that image format drivers need on bs->file change, too. Make sure to update the permissions during bdrv_reopen(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-26block: Base permissions on rw state after reopenKevin Wolf
When new permissions are calculated during bdrv_reopen(), they need to be based on the state of the graph as it will be after the reopen has completed, not on the current state of the involved nodes. This patch makes bdrv_is_writable() optionally accept a BlockReopenQueue from which the new flags are taken. This is then used for determining the new bs->file permissions of format drivers as soon as we add the code to actually pass a non-NULL reopen queue to the .bdrv_child_perm callbacks. While moving bdrv_is_writable(), make it static. It isn't used outside block.c. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-04block: remove unused bdrv_media_changedManos Pitsidianakis
This function is not used anywhere, so remove it. Markus Armbruster adds: The i82078 floppy device model used to call bdrv_media_changed() to implement its media change bit when backed by a host floppy. This went away in 21fcf36 "fdc: simplify media change handling". Probably broke host floppy media change. Host floppy pass-through was dropped in commit f709623. bdrv_media_changed() has never been used for anything else. Remove it. (Source is Message-ID: <87y3ruaypm.fsf@dusky.pond.sub.org>) Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-08-08block: Allow reopen rw without BDRV_O_ALLOW_RDWRKevin Wolf
BDRV_O_ALLOW_RDWR is a flag that tells whether qemu can internally reopen a node read-write temporarily because the user requested read-write for the top-level image, but qemu decided that read-only is enough for this node (a backing file). bdrv_reopen() is different, it is also used for cases where the user changed their mind and wants to update the options. There is no reason to forbid making a node read-write in that case. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
2017-07-24block: Skip implicit nodes in query-block/blockstatsKevin Wolf
Commits 0db832f and 6cdbceb introduced the automatic insertion of filter nodes above the top layer of mirror and commit block jobs. The assumption made there was that since libvirt doesn't do node-level management of the block layer yet, it shouldn't be affected by added nodes. This is true as far as commands issued by libvirt are concerned. It only uses BlockBackend names to address nodes, so any operations it performs still operate on the root of the tree as intended. However, the assumption breaks down when you consider query commands, which return data for the wrong node now. These commands also return information on some child nodes (bs->file and/or bs->backing), which libvirt does make use of, and which refer to the wrong nodes, too. One of the consequences is that oVirt gets wrong information about the image size and stops the VM in response as long as a mirror or commit job is running: https://bugzilla.redhat.com/show_bug.cgi?id=1470634 This patch fixes the problem by hiding the implicit nodes created automatically by the mirror and commit block jobs in the output of query-block and BlockBackend-based query-blockstats as long as the user doesn't indicate that they are aware of those nodes by providing a node name for them in the QMP command to start the block job. The node-based commands query-named-block-nodes and query-blockstats with query-nodes=true still show all nodes, including implicit ones. This ensures that users that are capable of node-level management can still access the full information; users that only know BlockBackends won't use these commands. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Tested-by: Eric Blake <eblake@redhat.com>
2017-07-11block: Add PreallocMode to bdrv_truncate()Max Reitz
For block drivers that just pass a truncate request to the underlying protocol, we can now pass the preallocation mode instead of aborting if it is not PREALLOC_MODE_OFF. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11block: add bdrv_measure() APIStefan Hajnoczi
bdrv_measure() provides a conservative maximum for the size of a new image. This information is handy if storage needs to be allocated (e.g. a SAN or an LVM volume) ahead of time. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20170705125738.8777-2-stefanha@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11block: add bdrv_can_store_new_dirty_bitmapVladimir Sementsov-Ogievskiy
This will be needed to check some restrictions before making bitmap persistent in qmp-block-dirty-bitmap-add (this functionality will be added by future patch) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170628120530.31251-22-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11block: remove all encryption handling APIsDaniel P. Berrange
Now that all encryption keys must be provided upfront via the QCryptoSecret API and associated block driver properties there is no need for any explicit encryption handling APIs in the block layer. Encryption can be handled transparently within the block driver. We only retain an API for querying whether an image is encrypted or not, since that is a potentially useful piece of metadata to report to the user. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-18-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-10block: Make bdrv_is_allocated_above() byte-basedEric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the signature of the function to use int64_t *pnum ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned, but that can be relaxed when a later patch implements byte-based block status. Therefore, for the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_is_allocated(). But some code, particularly stream_run(), gets a lot simpler because it no longer has to mess with sectors. Leave comments where we can further simplify by switching to byte-based iterations, once later patches eliminate the need for sector-aligned operations. For ease of review, bdrv_is_allocated() was tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>