aboutsummaryrefslogtreecommitdiff
path: root/block/quorum.c
AgeCommit message (Collapse)Author
2023-12-21graph-lock: remove AioContext lockingStefan Hajnoczi
Stop acquiring/releasing the AioContext lock in bdrv_graph_wrlock()/bdrv_graph_unlock() since the lock no longer has any effect. The distinction between bdrv_graph_wrunlock() and bdrv_graph_wrunlock_ctx() becomes meaningless and they can be collapsed into one function. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231205182011.1976568-6-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-21block: Fix deadlocks in bdrv_graph_wrunlock()Kevin Wolf
bdrv_graph_wrunlock() calls aio_poll(), which may run callbacks that have a nested event loop. Nested event loops can depend on other iothreads making progress, so in order to allow them to make progress it must not hold the AioContext lock of another thread while calling aio_poll(). This introduces a @bs parameter to bdrv_graph_wrunlock() whose AioContext is temporarily dropped (which matches bdrv_graph_wrlock()), and a bdrv_graph_wrunlock_ctx() that can be used if the BlockDriverState doesn't necessarily exist any more when unlocking. This also requires a change to bdrv_schedule_unref(), which was relying on the incorrectly taken lock. It needs to take the lock itself now. While this is a separate bug, it can't be fixed a separate patch because otherwise the intermediate state would either deadlock or try to release a lock that we don't even hold. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231115172012.112727-3-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [kwolf: Fixed up bdrv_schedule_unref()] Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-10-12block: Mark bdrv_get_parent_name() and callers GRAPH_RDLOCKKevin Wolf
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_get_parent_name() need to hold a reader lock for the graph because it accesses the parents list of a node. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-13-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCKKevin Wolf
The functions read the parents list in the generic block layer, so we need to hold the graph lock already there. The BlockDriver implementations actually modify the graph, so it has to be a writer lock. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230911094620.45040-22-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20block: Mark bdrv_unref_child() GRAPH_WRLOCKKevin Wolf
Instead of taking the writer lock internally, require callers to already hold it when calling bdrv_unref_child(). These callers will typically already hold the graph lock once the locking work is completed, which means that they can't call functions that take it internally. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230911094620.45040-21-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-20block: Mark bdrv_attach_child() GRAPH_WRLOCKKevin Wolf
Instead of taking the writer lock internally, require callers to already hold it when calling bdrv_attach_child_common(). These callers will typically already hold the graph lock once the locking work is completed, which means that they can't call functions that take it internally. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230911094620.45040-13-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Mark bdrv_recurse_can_replace() and callers GRAPH_RDLOCKKevin Wolf
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_recurse_can_replace() need to hold a reader lock for the graph because it accesses the children list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-20-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-23block: Mark bdrv_co_refresh_total_sectors() and callers GRAPH_RDLOCKKevin Wolf
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_refresh_total_sectors() need to hold a reader lock for the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-24-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-23block: Mark public read/write functions GRAPH_RDLOCKKevin Wolf
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_pread*/pwrite*() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-12-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-23block: Mark read/write in block/io.c GRAPH_RDLOCKKevin Wolf
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_driver_*() need to hold a reader lock for the graph. It doesn't add the annotation to public functions yet. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-11-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-23block: Mark bdrv_co_pwrite_zeroes() and callers GRAPH_RDLOCKKevin Wolf
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_pwrite_zeroes() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-10-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-23block: Mark bdrv_co_flush() and callers GRAPH_RDLOCKEmanuele Giuseppe Esposito
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_flush() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-8-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-23block: Mark bdrv_co_block_status() and callers GRAPH_RDLOCKKevin Wolf
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_block_status() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-5-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-01block: Convert bdrv_refresh_total_sectors() to co_wrapper_mixedEmanuele Giuseppe Esposito
BlockDriver->bdrv_getlength is categorized as IO callback, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since the callback traverses the block nodes graph, which however is only possible in a coroutine. Therefore turn it into a co_wrapper to move the actual function into a coroutine where the lock can be taken. Because now this function creates a new coroutine and polls, we need to take the AioContext lock where it is missing, for the only reason that internally co_wrapper calls AIO_WAIT_WHILE and it expects to release the AioContext lock. This is especially messy when a co_wrapper creates a coroutine and polls in bdrv_open_driver, because this function has so many callers in so many context that it can easily lead to deadlocks. Therefore the new rule for bdrv_open_driver is that the caller must always hold the AioContext lock of the given bs (except if it is a coroutine), because the function calls bdrv_refresh_total_sectors() which is now a co_wrapper. Once the rwlock is ultimated and placed in every place it needs to be, we will poll using AIO_WAIT_WHILE_UNLOCKED and remove the AioContext lock. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-7-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-12-14qapi block: Elide redundant has_FOO in generated CMarkus Armbruster
The has_FOO for pointer-valued FOO are redundant, except for arrays. They are also a nuisance to work with. Recent commit "qapi: Start to elide redundant has_FOO in generated C" provided the means to elide them step by step. This is the step for qapi/block*.json. Said commit explains the transformation in more detail. There is one instance of the invariant violation mentioned there: qcow2_signal_corruption() passes false, "" when node_name is an empty string. Take care to pass NULL then. The previous two commits cleaned up two more. Additionally, helper bdrv_latency_histogram_stats() loses its output parameters and returns a value instead. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Hanna Reitz <hreitz@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221104160712.3005652-11-armbru@redhat.com> [Fixes for #ifndef LIBRBD_SUPPORTS_ENCRYPTION and MacOS squashed in]
2022-10-07quorum: Remove unnecessary forward declarationKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221006122607.162769-1-kwolf@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-07quorum: add missing coroutine_fn annotationsPaolo Bonzini
Callers of coroutine_fn must be coroutine_fn themselves, or the call must be within "if (qemu_in_coroutine())". Apply coroutine_fn to functions where this holds. Reviewed-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220922084924.201610-19-pbonzini@redhat.com> [kwolf: Fixed up coding style] Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-07osdep: Move memalign-related functions to their own headerPeter Maydell
Move the various memalign-related functions out of osdep.h and into their own header, which we include only where they are used. While we're doing this, add some brief documentation comments. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org
2021-09-29block: use int64_t instead of int in driver write_zeroes handlersVladimir Sementsov-Ogievskiy
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write_zeroes handlers bytes parameter to int64_t. The only caller of all updated function is bdrv_co_do_pwrite_zeroes(). bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s max_write_zeroes is limited to INT_MAX. So, updated functions all are safe, they will not get "bytes" larger than before. Still, let's look through all updated functions, and add assertions to the ones which are actually unprepared to values larger than INT_MAX. For these drivers also set explicit max_pwrite_zeroes limit. Let's go: blkdebug: calculations can't overflow, thanks to bdrv_check_qiov_request() in generic layer. rule_check() and bdrv_co_pwrite_zeroes() both have 64bit argument. blklogwrites: pass to blk_log_writes_co_log() with 64bit argument. blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pwrite_zeroes() which is OK copy-before-write: Calls cbw_do_copy_before_write() and bdrv_co_pwrite_zeroes, both have 64bit argument. file-posix: both handler calls raw_do_pwrite_zeroes, which is updated. In raw_do_pwrite_zeroes() calculations are OK due to bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes which is uint64_t. Check also where that uint64_t gets handed: handle_aiocb_write_zeroes_block() passes a uint64_t[2] to ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate() which takes off_t (and we compile to always have 64-bit off_t), as does handle_aiocb_write_zeroes_unmap. All look safe. gluster: bytes go to GlusterAIOCB::size which is int64_t and to glfs_zerofill_async works with off_t. iscsi: Aha, here we deal with iscsi_writesame16_task() that has uint32_t num_blocks argument and iscsi_writesame16_task() has uint16_t argument. Make comments, add assertions and clarify max_pwrite_zeroes calculation. iscsi_allocmap_() functions already has int64_t argument is_byte_request_lun_aligned is simple to update, do it. mirror_top: pass to bdrv_mirror_top_do_write which has uint64_t argument nbd: Aha, here we have protocol limitation, and NBDRequest::len is uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are OK for now. nvme: Again, protocol limitation. And no inherent limit for write-zeroes at all. But from code that calculates cdw12 it's obvious that we do have limit and alignment. Let's clarify it. Also, obviously the code is not prepared to handle bytes=0. Let's handle this case too. trace events already 64bit preallocate: pass to handle_write() and bdrv_co_pwrite_zeroes(), both 64bit. rbd: pass to qemu_rbd_start_co() which is 64bit. qcow2: offset + bytes and alignment still works good (thanks to bdrv_check_qiov_request()), so tail calculation is OK qcow2_subcluster_zeroize() has 64bit argument, should be OK trace events updated qed: qed_co_request wants int nb_sectors. Also in code we have size_t used for request length which may be 32bit. So, let's just keep INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and don't care. raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both 64bit. throttle: Both throttle_group_co_io_limits_intercept() and bdrv_co_pwrite_zeroes() are 64bit. vmdk: pass to vmdk_pwritev which is 64bit quorum: pass to quorum_co_pwritev() which is 64bit Hooray! At this point all block drivers are prepared to support 64bit write-zero requests, or have explicitly set max_pwrite_zeroes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-8-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: use <= rather than < in assertions relying on max_pwrite_zeroes] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-09-29block: use int64_t instead of uint64_t in driver write handlersVladimir Sementsov-Ogievskiy
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_\(aio\|co\)_pwritev\(_part\)\?' shows that's there three callers of driver function: bdrv_driver_pwritev() and bdrv_driver_pwritev_compressed() in block/io.c, both pass int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_save_vmstate() does bdrv_check_qiov_request(). Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_\(aio\|co\)_pwritev\(_part\)\?\s*=' | \ awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \ while read func; do git grep "$func(" | \ grep -v "$func(BlockDriverState"; done shows several callers: qcow2: qcow2_co_truncate() write at most up to @offset, which is checked in generic qcow2_co_truncate() by bdrv_check_request(). qcow2_co_pwritev_compressed_task() pass the request (or part of the request) that already went through normal write path, so it should be OK qcow: qcow_co_pwritev_compressed() pass int64_t, it's updated by this patch quorum: quorum_co_pwrite_zeroes() pass int64_t and int - OK throttle: throttle_co_pwritev_compressed() pass int64_t, it's updated by this patch vmdk: vmdk_co_pwritev_compressed() pass int64_t, it's updated by this patch Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-09-29block: use int64_t instead of uint64_t in driver read handlersVladimir Sementsov-Ogievskiy
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver read handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_\(aio\|co\)_preadv\(_part\)\?' shows that's there three callers of driver function: bdrv_driver_preadv() in block/io.c, passes int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_load_vmstate() does bdrv_check_qiov_request(). do_perform_cow_read() has uint64_t argument. And a lot of things in qcow2 driver are uint64_t, so converting it is big job. But we must not work with requests that don't satisfy bdrv_check_qiov_request(), so let's just assert it here. Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_\(aio\|co\)_preadv\(_part\)\?\s*=' | \ awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \ while read func; do git grep "$func(" | \ grep -v "$func(BlockDriverState"; done The only one such caller: QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1); ... ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0); in tests/unit/test-bdrv-drain.c, and it's OK obviously. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix typos] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-02block/quorum: Provide .bdrv_co_flush instead of .bdrv_co_flush_to_diskLukas Straub
The quorum block driver uses a custom flush callback to handle the case when some children return io errors. In that case it still returns success if enough children are healthy. However, it provides it as the .bdrv_co_flush_to_disk callback, not as .bdrv_co_flush. This causes the block layer to do it's own generic flushing for the children instead, which doesn't handle errors properly. Fix this by providing .bdrv_co_flush instead of .bdrv_co_flush_to_disk so the block layer uses the custom flush callback. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reported-by: Minghao Yuan <meeho@qq.com> Message-Id: <20210518134214.11ccf05f@gecko.fritz.box> Tested-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-08block: check return value of bdrv_open_child and drop error propagationVladimir Sementsov-Ogievskiy
This patch is generated by cocci script: @@ symbol bdrv_open_child, errp, local_err; expression file; @@ file = bdrv_open_child(..., - &local_err + errp ); - if (local_err) + if (!file) { ... - error_propagate(errp, local_err); ... } with command spatch --sp-file x.cocci --macro-file scripts/cocci-macro-file.h \ --in-place --no-show-diff --max-width 80 --use-gitgrep block Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-4-vsementsov@virtuozzo.com> [eblake: fix qcow2_do_open() to use ERRP_GUARD, necessary as the only caller to pass allow_none=true] Signed-off-by: Eric Blake <eblake@redhat.com>
2020-12-18quorum: Implement bdrv_co_pwrite_zeroes()Alberto Garcia
This simply calls bdrv_co_pwrite_zeroes() in all children. bs->supported_zero_flags is also set to the flags that are supported by all children. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <2f09c842781fe336b4c2e40036bba577b7430190.1605286097.git.berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18quorum: Implement bdrv_co_block_status()Alberto Garcia
The quorum driver does not implement bdrv_co_block_status() and because of that it always reports to contain data even if all its children are known to be empty. One consequence of this is that if we for example create a quorum with a size of 10GB and we mirror it to a new image the operation will write 10GB of actual zeroes to the destination image wasting a lot of time and disk space. Since a quorum has an arbitrary number of children of potentially different formats there is no way to report all possible allocation status flags in a way that makes sense, so this implementation only reports when a given region is known to contain zeroes (BDRV_BLOCK_ZERO) or not (BDRV_BLOCK_DATA). If all children agree that a region contains zeroes then we can return BDRV_BLOCK_ZERO using the smallest size reported by the children (because all agree that a region of at least that size contains zeroes). If at least one child disagrees we have to return BDRV_BLOCK_DATA. In this case we use the largest of the sizes reported by the children that didn't return BDRV_BLOCK_ZERO (because we know that there won't be an agreement for at least that size). Signed-off-by: Alberto Garcia <berto@igalia.com> Tested-by: Tao Xu <tao3.xu@intel.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <db83149afcf0f793effc8878089d29af4c46ffe1.1605286097.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-10Tweak a few "Parameter 'NAME' expects THING" error messageMarkus Armbruster
Change to "expects a THING" where that's an obvious improvement Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201113082626.2725812-11-armbru@redhat.com>
2020-11-17quorum: Require WRITE perm with rewrite-corruptedMax Reitz
Using rewrite-corrupted means quorum may issue writes to its children just from receiving read requests from its parents. Thus, it must take the WRITE permission when rewrite-corrupted is used. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201113211718.261671-2-mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-09-15block/quorum.c: stable children namesLukas Straub
If we remove the child with the highest index from the quorum, decrement s->next_child_index. This way we get stable children names as long as we only remove the last child. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Fixes: https://bugs.launchpad.net/bugs/1881231 Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <5d5f930424c1c770754041aa8ad6421dc4e2b58e.1596536719.git.lukasstraub2@web.de> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-10error: Reduce unnecessary error propagationMarkus Armbruster
When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away, even when we need to keep error_propagate() for other error paths. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-38-armbru@redhat.com>
2020-07-10error: Avoid unnecessary error_propagate() after error_setg()Markus Armbruster
Replace error_setg(&err, ...); error_propagate(errp, err); by error_setg(errp, ...); Related pattern: if (...) { error_setg(&err, ...); goto out; } ... out: error_propagate(errp, err); return; When all paths to label out are that way, replace by if (...) { error_setg(errp, ...); return; } and delete the label along with the error_propagate(). When we have at most one other path that actually needs to propagate, and maybe one at the end that where propagation is unnecessary, e.g. foo(..., &err); if (err) { goto out; } ... bar(..., &err); out: error_propagate(errp, err); return; move the error_propagate() to where it's needed, like if (...) { foo(..., &err); error_propagate(errp, err); return; } ... bar(..., errp); return; and transform the error_setg() as above. In some places, the transformation results in obviously unnecessary error_propagate(). The next few commits will eliminate them. Bonus: the elimination of gotos will make later patches in this series easier to review. Candidates for conversion tracked down with this Coccinelle script: @@ identifier err, errp; expression list args; @@ - error_setg(&err, args); + error_setg(errp, args); ... when != err error_propagate(errp, err); Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-34-armbru@redhat.com>
2020-07-10qemu-option: Use returned bool to check for failureMarkus Armbruster
The previous commit enables conversion of foo(..., &err); if (err) { ... } to if (!foo(..., &err)) { ... } for QemuOpts functions that now return true / false on success / error. Coccinelle script: @@ identifier fun = { opts_do_parse, parse_option_bool, parse_option_number, parse_option_size, qemu_opt_parse, qemu_opt_rename, qemu_opt_set, qemu_opt_set_bool, qemu_opt_set_number, qemu_opts_absorb_qdict, qemu_opts_do_parse, qemu_opts_from_qdict_entry, qemu_opts_set, qemu_opts_validate }; expression list args, args2; typedef Error; Error *err; @@ - fun(args, &err, args2); - if (err) + if (!fun(args, &err, args2)) { ... } A few line breaks tidied up manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-15-armbru@redhat.com> [Conflict with commit 0b6786a9c1 "block/amend: refactor qcow2 amend options" resolved by rerunning Coccinelle on master's version]
2020-05-18block: Drop @child_class from bdrv_child_perm()Max Reitz
Implementations should decide the necessary permissions based on @role. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-35-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Switch child_format users to child_of_bdsMax Reitz
Both users (quorum and blkverify) use child_format for not-really-filtered children, so the appropriate BdrvChildRole in both cases is DATA. (Note that this will cause bdrv_inherited_options() to force-allow format probing.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-22-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Pass BdrvChildRole to bdrv_child_perm()Max Reitz
For now, all callers pass 0 and no callee evaluates this value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Add BdrvChildRole to BdrvChildMax Reitz
For now, it is always set to 0. Later patches in this series will ensure that all callers pass an appropriate combination of flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Rename BdrvChildRole to BdrvChildClassMax Reitz
This structure nearly only contains parent callbacks for child state changes. It cannot really reflect a child's role, because different roles may overlap (as we will see when real roles are introduced), and because parents can have custom callbacks even when the child fulfills a standard role. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18quorum: Stop marking it as a filterMax Reitz
Quorum is not a filter, for example because it cannot guarantee which of its children will serve the next request. Thus, any of its children may differ from the data visible to quorum's parents. We have other filters with multiple children, but they differ in this aspect: - blkverify quits the whole qemu process if its children differ. As such, we can always skip it when we want to skip it (as a filter node) by going to any of its children. Both have the same data. - replication generally serves requests from bs->file, so this is its only actually filtered child. - Block job filters currently only have one child, but they will probably get more children in the future. Still, they will always have only one actually filtered child. Having "filters" as a dedicated node category only makes sense if you can skip them by going to a one fixed child that always shows the same data as the filter node. Quorum cannot fulfill this, so it is not a filter. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-13-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18block: Remove bdrv_recurse_is_first_non_filter()Max Reitz
It no longer has any users. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-11-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18quorum: Implement .bdrv_recurse_can_replace()Max Reitz
Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200218103454.296704-9-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18quorum: Fix child permissionsMax Reitz
Quorum cannot share WRITE or RESIZE on its children. Presumably, it only does so because as a filter, it seemed intuitively correct to point its .bdrv_child_perm to bdrv_filter_default_perm(). However, it is not really a filter, and bdrv_filter_default_perm() does not work for it, so we have to provide a custom .bdrv_child_perm implementation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-12Include qemu/module.h where needed, drop it from qemu-common.hMarkus Armbruster
Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]
2019-05-28block: Make bdrv_root_attach_child() unref child_bs on failureAlberto Garcia
A consequence of the previous patch is that bdrv_attach_child() transfers the reference to child_bs from the caller to parent_bs, which will drop it on bdrv_close() or when someone calls bdrv_unref_child(). But this only happens when bdrv_attach_child() succeeds. If it fails then the caller is responsible for dropping the reference to child_bs. This patch makes bdrv_attach_child() take the reference also when there is an error, freeing the caller for having to do it. A similar situation happens with bdrv_root_attach_child(), so the changes on this patch affect both functions. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20dfb3d9ccec559cdd1a9690146abad5d204a186.1557754872.git.berto@igalia.com [mreitz: Removed now superfluous BdrvChild * variable in bdrv_open_child()] Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25block: Purify .bdrv_refresh_filename()Max Reitz
Currently, BlockDriver.bdrv_refresh_filename() is supposed to both refresh the filename (BDS.exact_filename) and set BDS.full_open_options. Now that we have generic code in the central bdrv_refresh_filename() for creating BDS.full_open_options, we can drop the latter part from all BlockDriver.bdrv_refresh_filename() implementations. This also means that we can drop all of the existing default code for this from the global bdrv_refresh_filename() itself. Furthermore, we now have to call BlockDriver.bdrv_refresh_filename() after having set BDS.full_open_options, because the block driver's implementation should now be allowed to depend on BDS.full_open_options being set correctly. Finally, with this patch we can drop the @options parameter from BlockDriver.bdrv_refresh_filename(); also, add a comment on this function's purpose in block/block_int.h while touching its interface. This completely obsoletes blklogwrite's implementation of .bdrv_refresh_filename(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-25-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25block: Add BlockDriver.bdrv_gather_child_optionsMax Reitz
Some follow-up patches will rework the way bs->full_open_options is refreshed in bdrv_refresh_filename(). The new implementation will remove the need for the block drivers' bdrv_refresh_filename() implementations to set bs->full_open_options; instead, it will be generic and use static information from each block driver. However, by implementing bdrv_gather_child_options(), block drivers will still be able to override the way the full_open_options of their children are incorporated into their own. We need to implement this function for VMDK because we have to prevent the generic implementation from gathering the options of all children: It is not possible to specify options for the extents through the runtime options. For quorum, the child names that would be used by the generic implementation and the ones that we actually (currently) want to use differ. See quorum_gather_child_options() for more information. Note that both of these are cases which are not ideal: In case of VMDK it would probably be nice to be able to specify options for all extents. In case of quorum, the current runtime option structure is simply broken and needs to be fixed (but that is left for another patch). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-23-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25block: Add strong_runtime_opts to BlockDriverMax Reitz
This new field can be set by block drivers to list the runtime options they accept that may influence the contents of the respective BDS. As of a follow-up patch, this list will be used by the common bdrv_refresh_filename() implementation to decide which options to put into BDS.full_open_options (and consequently whether a JSON filename has to be created), thus freeing the drivers of having to implement that logic themselves. Additionally, this patch adds the field to all of the block drivers that need it and sets it accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-22-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25quorum: Make bdrv_dirname() return NULLMax Reitz
While the common implementation for bdrv_dirname() should return NULL for quorum BDSs already (because they do not have a file node and their exact_filename field should be empty), there is no reason not to make that explicit. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-17-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25block: Use children list in bdrv_refresh_filenameMax Reitz
bdrv_refresh_filename() should invoke itself recursively on all children, not just on file. With that change, we can remove the manual invocations in blkverify, quorum, commit, mirror, and blklogwrites. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-11-05quorum: Forbid adding children in blkverify modeAlberto Garcia
The blkverify mode of Quorum only works when the number of children is exactly two, so any attempt to add a new one must return an error. quorum_del_child() on the other hand doesn't need any additional check because decreasing the number of children would make it go under the vote threshold. Signed-off-by: Alberto Garcia <berto@igalia.com> Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05quorum: Return an error if the blkverify mode has invalid settingsAlberto Garcia
The blkverify mode of Quorum can only be enabled if the number of children is exactly two and the value of vote-threshold is also two. If the user tries to enable it but the other settings are incorrect then QEMU simply prints an error message to stderr and carries on disabling the blkverify setting. This patch makes quorum_open() fail and return an error in this case. Signed-off-by: Alberto Garcia <berto@igalia.com> Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05quorum: Remove quorum_err()Alberto Garcia
This is a static function with only one caller, so there's no need to keep it. Inlining the code in quorum_compare() makes it much simpler. Signed-off-by: Alberto Garcia <berto@igalia.com> Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>