aboutsummaryrefslogtreecommitdiff
path: root/block.c
AgeCommit message (Collapse)Author
2020-03-06block: bdrv_reopen() with backing file in different AioContextKevin Wolf
This patch allows bdrv_reopen() (and therefore the x-blockdev-reopen QMP command) to attach a node as the new backing file even if the node is in a different AioContext than the parent if one of both nodes can be moved to the AioContext of the other node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <20200306141413.30705-3-kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-06block: Introduce 'bdrv_reopen_commit_post' stepPeter Krempa
Add another step in the reopen process where driver can execute code after permission changes are comitted. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <adc02cf591c3cb34e98e33518eb1c540a0f27db1.1582893284.git.pkrempa@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-06block: Fix leak in bdrv_create_file_fallback()Max Reitz
@options is leaked by the first two return statements in this function. Note that blk_new_open() takes the reference to @options even on failure, so all we need to do to fix the leak is to move the QDict allocation down to where we actually need it. Reported-by: Coverity (CID 1419884) Fixes: fd17146cd93d1704cd96d7c2757b325fc7aac6fd ("block: Generic file creation fallback") Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200225155618.133412-1-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-22qemu/queue.h: add QLIST_SAFE_REMOVE()Stefan Hajnoczi
QLIST_REMOVE() assumes the element is in a list. It also leaves the element's linked list pointers dangling. Introduce a safe version of QLIST_REMOVE() and convert open-coded instances of this pattern. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-20block: Generic file creation fallbackMax Reitz
If a protocol driver does not support image creation, we can see whether maybe the file exists already. If so, just truncating it will be sufficient. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200122164532.178040-3-mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20qapi: Allow getting flat output from 'query-named-block-nodes'Peter Krempa
When a management application manages node names there's no reason to recurse into backing images in the output of query-named-block-nodes. Add a parameter to the command which will return just the top level structs. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <4470f8c779abc404dcf65e375db195cd91a80651.1579509782.git.pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [mreitz: Fixed coding style] Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-18block: Remove bdrv_recurse_is_first_non_filter()Max Reitz
It no longer has any users. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-11-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18block: Use bdrv_recurse_can_replace()Max Reitz
Let check_to_replace_node() use the more specialized bdrv_recurse_can_replace() instead of bdrv_recurse_is_first_non_filter(), which is too restrictive (or, in the case of quorum, sometimes not restrictive enough). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-10-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18block: Add bdrv_recurse_can_replace()Max Reitz
After a couple of follow-up patches, this function will replace bdrv_recurse_is_first_non_filter() in check_to_replace_node(). bdrv_recurse_is_first_non_filter() is both not sufficiently specific for check_to_replace_node() (it allows cases that should not be allowed, like replacing child nodes of quorum with dissenting data that have more parents than just quorum), and it is too restrictive (it is perfectly fine to replace filters). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18block: Drop bdrv_is_first_non_filter()Max Reitz
It is unused now. (And it was ugly because it needed to explore all BDS chains from the top.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18block: Remove superfluous semicolonsPhilippe Mathieu-Daudé
Fixes: 132ada80c4a Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200218094402.26625-4-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-06block: fix memleaks in bdrv_refresh_filenamePan Nengyuan
If we call the qmp 'query-block' while qemu is working on 'block-commit', it will cause memleaks, the memory leak stack is as follow: Indirect leak of 12360 byte(s) in 3 object(s) allocated from: #0 0x7f80f0b6d970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) #1 0x7f80ee86049d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) #2 0x55ea95b5bb67 in qdict_new /mnt/sdb/qemu-4.2.0-rc0/qobject/qdict.c:29 #3 0x55ea956cd043 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6427 #4 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 #5 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 #6 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 #7 0x55ea958818ea in bdrv_block_device_info /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:56 #8 0x55ea958879de in bdrv_query_info /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:392 #9 0x55ea9588b58f in qmp_query_block /mnt/sdb/qemu-4.2.0-rc0/block/qapi.c:578 #10 0x55ea95567392 in qmp_marshal_query_block qapi/qapi-commands-block-core.c:95 Indirect leak of 4120 byte(s) in 1 object(s) allocated from: #0 0x7f80f0b6d970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) #1 0x7f80ee86049d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) #2 0x55ea95b5bb67 in qdict_new /mnt/sdb/qemu-4.2.0-rc0/qobject/qdict.c:29 #3 0x55ea956cd043 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6427 #4 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 #5 0x55ea956cc950 in bdrv_refresh_filename /mnt/sdb/qemu-4.2.0-rc0/block.c:6399 #6 0x55ea9569f301 in bdrv_backing_attach /mnt/sdb/qemu-4.2.0-rc0/block.c:1064 #7 0x55ea956a99dd in bdrv_replace_child_noperm /mnt/sdb/qemu-4.2.0-rc0/block.c:2283 #8 0x55ea956b9b53 in bdrv_replace_node /mnt/sdb/qemu-4.2.0-rc0/block.c:4196 #9 0x55ea956b9e49 in bdrv_append /mnt/sdb/qemu-4.2.0-rc0/block.c:4236 #10 0x55ea958c3472 in commit_start /mnt/sdb/qemu-4.2.0-rc0/block/commit.c:306 #11 0x55ea94b68ab0 in qmp_block_commit /mnt/sdb/qemu-4.2.0-rc0/blockdev.c:3459 #12 0x55ea9556a7a7 in qmp_marshal_block_commit qapi/qapi-commands-block-core.c:407 Fixes: bb808d5f5c0978828a974d547e6032402c339555 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-id: 20200116085600.24056-1-pannengyuan@huawei.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06block: Use a GString in bdrv_perm_names()Alberto Garcia
This is a bit more efficient than having to allocate and free memory for each new permission. The default size (30) is enough for "consistent read, write, resize". Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20200110171518.22168-1-berto@igalia.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-01-30blockdev: adds bdrv_parse_aio to use io_uringAarushi Mehta
Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-8-stefanha@redhat.com Message-Id: <20200120141858.587874-8-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-06block: Use bdrv_qapi_perm_to_blk_perm()Max Reitz
We can save some LoC in xdbg_graph_add_edge() by using bdrv_qapi_perm_to_blk_perm(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191108123455.39445-3-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-01-06block: Add bdrv_qapi_perm_to_blk_perm()Max Reitz
We need some way to correlate QAPI BlockPermission values with BLK_PERM_* flags. We could: (1) have the same order in the QAPI definition as the the BLK_PERM_* flags are in LSb-first order. However, then there is no guarantee that they actually match (e.g. when someone modifies the QAPI schema without thinking of the BLK_PERM_* definitions). We could add static assertions, but these would break what’s good about this solution, namely its simplicity. (2) define the BLK_PERM_* flags based on the BlockPermission values. But this way whenever someone were to modify the QAPI order (perfectly sensible in theory), the BLK_PERM_* values would change. Because these values are used for file locking, this might break file locking between different qemu versions. Therefore, go the slightly more cumbersome way: Add a function to translate from the QAPI constants to the BLK_PERM_* flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191108123455.39445-2-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-12-19block: Activate recursively even for already active nodesKevin Wolf
bdrv_invalidate_cache_all() assumes that all nodes in a given subtree are either active or inactive when it starts. Therefore, as soon as it arrives at an already active node, it stops. However, this assumption is wrong. For example, it's possible to take a snapshot of an inactive node, which results in an active overlay over an inactive backing file. The active overlay is probably also the root node of an inactive BlockBackend (blk->disable_perm == true). In this case, bdrv_invalidate_cache_all() does not need to do anything to activate the overlay node, but it still needs to recurse into the children and the parents to make sure that after returning success, really everything is activated. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2019-12-18block: Error out on image creation with conflicting size optionsKevin Wolf
If both the create options (qemu-img create -o ...) and the size parameter were given, the size parameter was silently ignored. Instead, make specifying two sizes an error. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2019-11-18block: Remove 'backing': null from bs->{explicit_,}optionsKevin Wolf
bs->options and bs->explicit_options shouldn't contain any options for child nodes. bdrv_open_inherited() takes care to remove any options that match a child name after opening the image and the same is done when reopening. However, we miss the case of 'backing': null, which is a child option, but results in no child being created. This means that a 'backing': null remains in bs->options and bs->explicit_options. A typical use for 'backing': null is in live snapshots: blockdev-add for the qcow2 overlay makes sure not to open the backing file (because it is already opened and blockdev-snapshot will attach it). After doing a blockdev-snapshot, bs->options and bs->explicit_options become inconsistent with the actual state (bs has a backing file now, but the options still say null). On the next occasion that the image is reopened, e.g. switching it from read-write to read-only when another snapshot is taken, the option will take effect again and the node incorrectly loses its backing file. Fix bdrv_open_inherited() to remove the 'backing' option from bs->options and bs->explicit_options even for the case where it specifies that no backing file is wanted. Reported-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Tested-by: Peter Krempa <pkrempa@redhat.com>
2019-10-26core: replace getpagesize() with qemu_real_host_page_sizeWei Yang
There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-17qcow2-bitmap: move bitmap reopen-rw code to qcow2_reopen_commitVladimir Sementsov-Ogievskiy
The only reason I can imagine for this strange code at the very-end of bdrv_reopen_commit is the fact that bs->read_only updated after calling drv->bdrv_reopen_commit in bdrv_reopen_commit. And in the same time, prior to previous commit, qcow2_reopen_bitmaps_rw did a wrong check for being writable, when actually it only need writable file child not self. So, as it's fixed, let's move things to correct place. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com> Message-id: 20190927122355.7344-10-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17block: reverse order for reopen commitsVladimir Sementsov-Ogievskiy
It's needed to fix reopening qcow2 with bitmaps to RW. Currently it can't work, as qcow2 needs write access to file child, to mark bitmaps in-image with IN_USE flag. But usually children goes after parents in reopen queue and file child is still RO on qcow2 reopen commit. Reverse reopen order to fix it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Max Reitz <mreitz@redhat.com> Acked-by: John Snow <jsnow@redhat.com> Message-id: 20190927122355.7344-3-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17block: switch reopen queue from QSIMPLEQ to QTAILQVladimir Sementsov-Ogievskiy
We'll need reverse-foreach in the following commit, QTAILQ support it, so move to QTAILQ. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190927122355.7344-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17block/dirty-bitmap: refactor bdrv_dirty_bitmap_nextVladimir Sementsov-Ogievskiy
bdrv_dirty_bitmap_next is always used in same pattern. So, split it into _next and _first, instead of combining two functions into one and add FOR_EACH_DIRTY_BITMAP macro. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-17block: move bdrv_can_store_new_dirty_bitmap to block/dirty-bitmap.cVladimir Sementsov-Ogievskiy
block/dirty-bitmap.c seems to be more appropriate for it and bdrv_remove_persistent_dirty_bitmap already in it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-10-10qapi: query-blockstat: add driver specific file-posix statsAnton Nefedov
A block driver can provide a callback to report driver-specific statistics. file-posix driver now reports discard statistics Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20190923121737.83281-10-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-10block: teach bdrv_debug_breakpoint skip filters with backingVladimir Sementsov-Ogievskiy
Teach bdrv_debug_breakpoint and bdrv_debug_remove_breakpoint skip filters with backing. This is needed to implement and use in backup job it's own backup_top filter driver (like mirror already has one), and without this improvement, breakpoint removal will fail at least in 55 iotest. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190920142056.12778-9-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-03block: fix permission update in bdrv_replace_nodeVladimir Sementsov-Ogievskiy
It's wrong to OR shared permissions. It may lead to crash on further permission updates. Also, no needs to consider previously calculated permissions, as at this point we already bind all new parents and bdrv_get_cumulative_perm result is enough. So fix the bug by just set permissions by bdrv_get_cumulative_perm result. Bug was introduced in long ago 234ac1a9025, in 2.9. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190824100740.61635-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19block: Add bdrv_has_zero_init_truncate()Max Reitz
No .bdrv_has_zero_init() implementation returns 1 if growing the file would add non-zero areas (at least with PREALLOC_MODE_OFF), so using it in lieu of this new function was always safe. But on the other hand, it is possible that growing an image that is not zero-initialized would still add a zero-initialized area, like when using nonpreallocating truncation on a preallocated image. For callers that care only about truncation, not about creation with potential preallocation, this new function is useful. Alternatively, we could have added a PreallocMode parameter to bdrv_has_zero_init(). But the only user would have been qemu-img convert, which does not have a plain PreallocMode value right now -- it would have to parse the creation option to obtain it. Therefore, the simpler solution is to let bdrv_has_zero_init() inquire the preallocation status and add the new bdrv_has_zero_init_truncate() that presupposes PREALLOC_MODE_OFF. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-4-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-16qapi: implement block-dirty-bitmap-remove transaction actionJohn Snow
It is used to do transactional movement of the bitmap (which is possible in conjunction with merge command). Transactional bitmap movement is needed in scenarios with external snapshot, when we don't want to leave copy of the bitmap in the base image. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190708220502.12977-3-jsnow@redhat.com [Edited "since" version to 4.2 --js] Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches: - file-posix: Fix O_DIRECT alignment detection - Fixes for concurrent block jobs - block-backend: Queue requests while drained (fix IDE vs. job crashes) - qemu-img convert: Deprecate using -n and -o together - iotests: Migration tests with filter nodes - iotests: More media change tests # gpg: Signature made Fri 16 Aug 2019 10:29:18 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: file-posix: Handle undetectable alignment qemu-img convert: Deprecate using -n and -o together block-backend: Queue requests while drained mirror: Keep mirror_top_bs drained after dropping permissions block: Remove blk_pread_unthrottled() iotests: Add test for concurrent stream/commit tests: Test mid-drain bdrv_replace_child_noperm() tests: Test polling in bdrv_drop_intermediate() block: Reduce (un)drains when replacing a child block: Keep subtree drained in drop_intermediate block: Simplify bdrv_filter_default_perms() iotests: Test migration with all kinds of filter nodes iotests: Move migration helpers to iotests.py iotests/118: Add -blockdev based tests iotests/118: Create test classes dynamically iotests/118: Test media change for scsi-cd Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-16Include qemu/main-loop.h lessMarkus Armbruster
In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16block: Reduce (un)drains when replacing a childMax Reitz
Currently, bdrv_replace_child_noperm() undrains the parent until it is completely undrained, then re-drains it after attaching the new child node. This is a problem with bdrv_drop_intermediate(): We want to keep the whole subtree drained, including parents, while the operation is under way. bdrv_replace_child_noperm() breaks this by allowing every parent to become unquiesced briefly, and then redraining it. In fact, there is no reason why the parent should become unquiesced and be allowed to submit requests to the new child node if that new node is supposed to be kept drained. So if anything, we have to drain the parent before detaching the old child node. Conversely, we have to undrain it only after attaching the new child node. Thus, change the whole drain algorithm here: Calculate the number of times we have to drain/undrain the parent before replacing the child node then drain it (if necessary), replace the child node, and then undrain it. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-08-16block: Keep subtree drained in drop_intermediateMax Reitz
bdrv_drop_intermediate() calls BdrvChildRole.update_filename(). That may poll, thus changing the graph, which potentially breaks the QLIST_FOREACH_SAFE() loop. Just keep the whole subtree drained. This is probably the right thing to do anyway (dropping nodes while the subtree is not drained seems wrong). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-08-16block: Simplify bdrv_filter_default_perms()Kevin Wolf
The same change as commit 2b23f28639 ('block/copy-on-read: Fix permissions for inactive node') made for the copy-on-read driver can be made for bdrv_filter_default_perms(): Retaining the old permissions from the BdrvChild if it is given complicates things unnecessarily when in the end this only means that the options set in the c == NULL case (i.e. during child creation) are retained. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2019-07-22block: Only the main loop can change AioContextsMax Reitz
bdrv_set_aio_context_ignore() can only work in the main loop: bdrv_drained_begin() only works in the main loop and the node's (old) AioContext; and bdrv_drained_end() really only works in the main loop and the node's (new) AioContext (contrary to its current comment, which is just wrong). Consequentially, bdrv_set_aio_context_ignore() must be called from the main loop. Luckily, assuming that we can make block graph changes only from the main loop as well, all its callers do that already. Note that changing a node's context in a sense is an operation that changes the block graph, so it actually makes sense to require this function to be called from the main loop. Also, fix bdrv_drained_end()'s description. You can only use it from the main loop or the node's AioContext, and in the latter case, the whole subtree must be in the same context. Fixes: e037c09c78520cbdb6da7cfc6ad0256d5870b814 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190722133054.21781-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-19block: Do not poll in bdrv_do_drained_end()Max Reitz
We should never poll anywhere in bdrv_do_drained_end() (including its recursive callees like bdrv_drain_invoke()), because it does not cope well with graph changes. In fact, it has been written based on the postulation that no graph changes will happen in it. Instead, the callers that want to poll must poll, i.e. all currently globally available wrappers: bdrv_drained_end(), bdrv_subtree_drained_end(), bdrv_unapply_subtree_drain(), and bdrv_drain_all_end(). Graph changes there do not matter. They can poll simply by passing a pointer to a drained_end_counter and wait until it reaches 0. This patch also adds a non-polling global wrapper for bdrv_do_drained_end() that takes a drained_end_counter pointer. We need such a variant because now no function called anywhere from bdrv_do_drained_end() must poll. This includes BdrvChildRole.drained_end(), which already must not poll according to its interface documentation, but bdrv_child_cb_drained_end() just violates that by invoking bdrv_drained_end() (which does poll). Therefore, BdrvChildRole.drained_end() must take a *drained_end_counter parameter, which bdrv_child_cb_drained_end() can pass on to the new bdrv_drained_end_no_poll() function. Note that we now have a pattern of all drained_end-related functions either polling or receiving a *drained_end_counter to let the caller poll based on that. A problem with a single poll loop is that when the drained section in bdrv_set_aio_context_ignore() ends, some nodes in the subgraph may be in the old contexts, while others are in the new context already. To let the collective poll in bdrv_drained_end() work correctly, we must not hold a lock to the old context, so that the old context can make progress in case it is different from the current context. (In the process, remove the comment saying that the current context is always the old context, because it is wrong.) In all other places, all nodes in a subtree must be in the same context, so we can just poll that. The exception of course is bdrv_drain_all_end(), but that always runs in the main context, so we can just poll NULL (like bdrv_drain_all_begin() does). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-19block: Introduce BdrvChild.parent_quiesce_counterMax Reitz
Commit 5cb2737e925042e6c7cd3fb0b01313950b03cddf laid out why bdrv_do_drained_end() must decrement the quiesce_counter after bdrv_drain_invoke(). It did not give a very good reason why it has to happen after bdrv_parent_drained_end(), instead only claiming symmetry to bdrv_do_drained_begin(). It turns out that delaying it for so long is wrong. Situation: We have an active commit job (i.e. a mirror job) from top to base for the following graph: filter | [file] | v top --[backing]--> base Now the VM is closed, which results in the job being cancelled and a bdrv_drain_all() happening pretty much simultaneously. Beginning the drain means the job is paused once whenever one of its nodes is quiesced. This is reversed when the drain ends. With how the code currently is, after base's drain ends (which means that it will have unpaused the job once), its quiesce_counter remains at 1 while it goes to undrain its parents (bdrv_parent_drained_end()). For some reason or another, undraining filter causes the job to be kicked and enter mirror_exit_common(), where it proceeds to invoke block_job_remove_all_bdrv(). Now base will be detached from the job. Because its quiesce_counter is still 1, it will unpause the job once more. So in total, undraining base will unpause the job twice. Eventually, this will lead to the job's pause_count going negative -- well, it would, were there not an assertion against this, which crashes qemu. The general problem is that if in bdrv_parent_drained_end() we undrain parent A, and then undrain parent B, which then leads to A detaching the child, bdrv_replace_child_noperm() will undrain A as if we had not done so yet; that is, one time too many. It follows that we cannot decrement the quiesce_counter after invoking bdrv_parent_drained_end(). Unfortunately, decrementing it before bdrv_parent_drained_end() would be wrong, too. Imagine the above situation in reverse: Undraining A leads to B detaching the child. If we had already decremented the quiesce_counter by that point, bdrv_replace_child_noperm() would undrain B one time too little; because it expects bdrv_parent_drained_end() to issue this undrain. But bdrv_parent_drained_end() won't do that, because B is no longer a parent. Therefore, we have to do something else. This patch opts for introducing a second quiesce_counter that counts how many times a child's parent has been quiesced (though c->role->drained_*). With that, bdrv_replace_child_noperm() just has to undrain the parent exactly that many times when removing a child, and it will always be right. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-15block: Deep-clear inherits_fromMax Reitz
BDS.inherits_from does not always point to an immediate parent node. When launching a block job with a filter node, for example, the node directly below the filter will not point to the filter, but keep its old pointee (above the filter). If that pointee goes away while the job is still running, the node's inherits_from will not be updated and thus point to garbage. To fix this, bdrv_unref_child() has to check not only the parent node's immediate children for nodes whose inherits_from needs to be cleared, but its whole subtree. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190703172813.6868-7-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-15block: Add BDS.never_freezeMax Reitz
The commit and the mirror block job must be able to drop their filter node at any point. However, this will not be possible if any of the BdrvChild links to them is frozen. Therefore, we need to prevent them from ever becoming frozen. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190703172813.6868-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-06-18block: Ignore loosening perm restrictions failuresMax Reitz
We generally assume that loosening permission restrictions can never fail. We have seen in the past that this assumption is wrong. This has led to crashes because we generally pass &error_abort when loosening permissions. However, a failure in such a case should actually be handled in quite the opposite way: It is very much not fatal, so qemu may report it, but still consider the operation successful. The only realistic problem is that qemu may then retain permissions and thus locks on images it actually does not require. But again, that is not fatal. To implement this behavior, we make all functions that change permissions and that pass &error_abort to the initiating function (bdrv_check_perm() or bdrv_child_check_perm()) evaluate the @loosen_restrictions value introduced in the previous patch. If it is true and an error did occur, we abort the permission update, discard the error, and instead report success to the caller. bdrv_child_try_set_perm() itself does not pass &error_abort, but it is the only public function to change permissions. As such, callers may pass &error_abort to it, expecting dropping permission restrictions to never fail. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18block: Add *tighten_restrictions to *check*_perm()Max Reitz
This patch makes three functions report whether the necessary permission change tightens restrictions or not. These functions are: - bdrv_check_perm() - bdrv_check_update_perm() - bdrv_child_check_perm() Callers can use this result to decide whether a failure is fatal or not (see the next patch). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18block: Fix order in bdrv_replace_child()Max Reitz
We have to start by applying the permission restrictions to new_bs before we can loosen them on old_bs. See the comment for the explanation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18block: Add bdrv_child_refresh_perms()Max Reitz
If a block node uses bdrv_child_try_set_perm() to change the permission it takes on its child, the result may be very short-lived. If anything makes the block layer recalculate the permissions internally, it will invoke the node driver's .bdrv_child_perm() implementation. The permission/shared permissions masks that returns will then override the values previously passed to bdrv_child_try_set_perm(). If drivers want a child edge to have specific values for the permissions/shared permissions mask, it must return them in .bdrv_child_perm(). Consequentially, there is no need for them to pass the same values to bdrv_child_try_set_perm() then: It is better to have a function that invokes .bdrv_child_perm() and calls bdrv_child_try_set_perm() with the result. This patch adds such a function under the name of bdrv_child_refresh_perms(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18block: drop bs->jobVladimir Sementsov-Ogievskiy
Drop remaining users of bs->job: 1. assertions actually duplicated by assert(!bs->refcnt) 2. trace-point seems not enough reason to change stream_start to return BlockJob pointer 3. Restricting creation of two jobs based on same bs is bad idea, as 3.1 Some jobs creates filters to be their main node, so, this check don't actually prevent creating second job on same real node (which will create another filter node) (but I hope it is restricted by other mechanisms) 3.2 Even without bs->job we have two systems of permissions: op-blockers and BLK_PERM 3.3 We may want to run several jobs on one node one day And finally, drop bs->job pointer itself. Hurrah! Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04block: Remove bdrv_set_aio_context()Kevin Wolf
All callers of bdrv_set_aio_context() are eliminated now, they have moved to bdrv_try_set_aio_context() and related safe functions. Remove bdrv_set_aio_context(). With this, we can now know that the .set_aio_ctx callback must be present in bdrv_set_aio_context_ignore() because bdrv_can_set_aio_context() would have returned false previously, so instead of checking the condition, we can assert it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04block: Remove wrong bdrv_set_aio_context() callsKevin Wolf
The mirror and commit block jobs use bdrv_set_aio_context() to move their filter node into the right AioContext before hooking it up in the graph. Similarly, bdrv_open_backing_file() explicitly moves the backing file node into the right AioContext first. This isn't necessary any more, they get automatically moved into the right context now when attaching them. However, in the case of bdrv_open_backing_file() with a node reference, it's actually not only unnecessary, but even wrong: The unchecked bdrv_set_aio_context() changes the AioContext of the child node even if other parents require it to retain the old context. So this is not only a simplification, but a bug fix, too. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1684342 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04block: Move node without parents to main AioContextKevin Wolf
A node should only be in a non-default AioContext if a user is attached to it that requires this. When the last parent of a node is gone, it can move back to the main AioContext. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04block: Adjust AioContexts when attaching nodesKevin Wolf
So far, we only made sure that updating the AioContext of a node affected the whole subtree. However, if a node is newly attached to a new parent, we also need to make sure that both the subtree of the node and the parent are in the same AioContext. This tries to move the new child node to the parent AioContext and returns an error if this isn't possible. BlockBackends now actually apply their AioContext to their root node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04block: Add BlockBackend.ctxKevin Wolf
This adds a new parameter to blk_new() which requires its callers to declare from which AioContext this BlockBackend is going to be used (or the locks of which AioContext need to be taken anyway). The given context is only stored and kept up to date when changing AioContexts. Actually applying the stored AioContext to the root node is saved for another commit. Signed-off-by: Kevin Wolf <kwolf@redhat.com>