aboutsummaryrefslogtreecommitdiff
path: root/block/mirror.c
AgeCommit message (Collapse)Author
2022-06-29block: use 'unsigned' for in_flight field on driver stateDenis V. Lunev
This patch makes in_flight field 'unsigned' for BDRVNBDState and MirrorBlockJob. This matches the definition of this field on BDS and is generically correct - we should never get negative value here. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: John Snow <jsnow@redhat.com> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> CC: Kevin Wolf <kwolf@redhat.com> CC: Hanna Reitz <hreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2022-03-07osdep: Move memalign-related functions to their own headerPeter Maydell
Move the various memalign-related functions out of osdep.h and into their own header, which we include only where they are used. While we're doing this, add some brief documentation comments. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org
2022-03-04assertions for block_int global state APIEmanuele Giuseppe Esposito
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-13-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-01-14block: drop BLK_PERM_GRAPH_MODVladimir Sementsov-Ogievskiy
First, this permission never protected a node from being changed, as generic child-replacing functions don't check it. Second, it's a strange thing: it presents a permission of parent node to change its child. But generally, children are replaced by different mechanisms, like jobs or qmp commands, not by nodes. Graph-mod permission is hard to understand. All other permissions describe operations which done by parent node on its child: read, write, resize. Graph modification operations are something completely different. The only place where BLK_PERM_GRAPH_MOD is used as "perm" (not shared perm) is mirror_start_job, for s->target. Still modern code should use bdrv_freeze_backing_chain() to protect from graph modification, if we don't do it somewhere it may be considered as a bug. So, it's a bit risky to drop GRAPH_MOD, and analyzing of possible loss of protection is hard. But one day we should do it, let's do it now. One more bit of information is that locking the corresponding byte in file-posix doesn't make sense at all. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210902093754.2352-1-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-12-28blockjob: drop BlockJob.blk fieldVladimir Sementsov-Ogievskiy
It's unused now (except for permission handling)[*]. The only reasonable user of it was block-stream job, recently updated to use own blk. And other block jobs prefer to use own source node related objects. So, the arguments of dropping the field are: - block jobs prefer not to use it - block jobs usually has more then one node to operate on, and better to operate symmetrically (for example has both source and target blk's in specific block-job state structure) *: BlockJob.blk is used to keep some permissions. We simply move permissions to block-job child created in block_job_create() together with blk. In mirror, we just should not care anymore about restoring state of blk. Most probably this code could be dropped long ago, after dropping bs->job pointer. Now it finally goes away together with BlockJob.blk itself. iotest 141 output is updated, as "bdrv_has_blk(bs)" check in qmp_blockdev_del() doesn't fail (we don't have blk now). Still, new error message looks even better. In iotest 283 we need to add a job id, otherwise "Invalid job ID" happens now earlier than permission check (as permissions moved from blk to block-job node). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com>
2021-10-07mirror: Do not clear .cancelledHanna Reitz
Clearing .cancelled before leaving the main loop when the job has been soft-cancelled is no longer necessary since job_is_cancelled() only returns true for jobs that have been force-cancelled. Therefore, this only makes a differences in places that call job_cancel_requested(). In block/mirror.c, this is done only before .cancelled was cleared. In job.c, there are two callers: - job_completed_txn_abort() asserts that .cancelled is true, so keeping it true will not affect this place. - job_complete() refuses to let a job complete that has .cancelled set. It is correct to refuse to let the user invoke job-complete on mirror jobs that have already been soft-cancelled. With this change, there are no places that reset .cancelled to false and so we can be sure that .force_cancel can only be true if .cancelled is true as well. Assert this in job_is_cancelled(). Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006151940.214590-13-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-10-07mirror: Stop active mirroring after force-cancelHanna Reitz
Once the mirror job is force-cancelled (job_is_cancelled() is true), we should not generate new I/O requests. This applies to active mirroring, too, so stop it once the job is cancelled. (We must still forward all I/O requests to the source, though, of course, but those are not really I/O requests generated by the job, so this is fine.) Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006151940.214590-12-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-10-07mirror: Check job_is_cancelled() earlierHanna Reitz
We must check whether the job is force-cancelled early in our main loop, most importantly before any `continue` statement. For example, we used to have `continue`s before our current checking location that are triggered by `mirror_flush()` failing. So, if `mirror_flush()` kept failing, force-cancelling the job would not terminate it. Jobs can be cancelled while they yield, and once they are (force-cancelled), they should not generate new I/O requests. Therefore, we should put the check after the last yield before mirror_iteration() is invoked. Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006151940.214590-11-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-10-07mirror: Use job_is_cancelled()Hanna Reitz
mirror_drained_poll() returns true whenever the job is cancelled, because "we [can] be sure that it won't issue more requests". However, this is only true for force-cancelled jobs, so use job_is_cancelled(). Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006151940.214590-10-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-10-07job: Add job_cancel_requested()Hanna Reitz
Most callers of job_is_cancelled() actually want to know whether the job is on its way to immediate termination. For example, we refuse to pause jobs that are cancelled; but this only makes sense for jobs that are really actually cancelled. A mirror job that is cancelled during READY with force=false should absolutely be allowed to pause. This "cancellation" (which is actually a kind of completion) may take an indefinite amount of time, and so should behave like any job during normal operation. For example, with on-target-error=stop, the job should stop on write errors. (In contrast, force-cancelled jobs should not get write errors, as they should just terminate and not do further I/O.) Therefore, redefine job_is_cancelled() to only return true for jobs that are force-cancelled (which as of HEAD^ means any job that interprets the cancellation request as a request for immediate termination), and add job_cancel_requested() as the general variant, which returns true for any jobs which have been requested to be cancelled, whether it be immediately or after an arbitrarily long completion phase. Finally, here is a justification for how different job_is_cancelled() invocations are treated by this patch: - block/mirror.c (mirror_run()): - The first invocation is a while loop that should loop until the job has been cancelled or scheduled for completion. What kind of cancel does not matter, only the fact that the job is supposed to end. - The second invocation wants to know whether the job has been soft-cancelled. Calling job_cancel_requested() is a bit too broad, but if the job were force-cancelled, we should leave the main loop as soon as possible anyway, so this should not matter here. - The last two invocations already check force_cancel, so they should continue to use job_is_cancelled(). - block/backup.c, block/commit.c, block/stream.c, anything in tests/: These jobs know only force-cancel, so there is no difference between job_is_cancelled() and job_cancel_requested(). We can continue using job_is_cancelled(). - job.c: - job_pause_point(), job_yield(), job_sleep_ns(): Only force-cancelled jobs should be prevented from being paused. Continue using job_is_cancelled(). - job_update_rc(), job_finalize_single(), job_finish_sync(): These functions are all called after the job has left its main loop. The mirror job (the only job that can be soft-cancelled) will clear .cancelled before leaving the main loop if it has been soft-cancelled. Therefore, these functions will observe .cancelled to be true only if the job has been force-cancelled. We can continue to use job_is_cancelled(). (Furthermore, conceptually, a soft-cancelled mirror job should not report to have been cancelled. It should report completion (see also the block-job-cancel QAPI documentation). Therefore, it makes sense for these functions not to distinguish between a soft-cancelled mirror job and a job that has completed as normal.) - job_completed_txn_abort(): All jobs other than @job have been force-cancelled. job_is_cancelled() must be true for them. Regarding @job itself: job_completed_txn_abort() is mostly called when the job's return value is not 0. A soft-cancelled mirror has a return value of 0, and so will not end up here then. However, job_cancel() invokes job_completed_txn_abort() if the job has been deferred to the main loop, which is mostly the case for completed jobs (which skip the assertion), but not for sure. To be safe, use job_cancel_requested() in this assertion. - job_complete(): This is function eventually invoked by the user (through qmp_block_job_complete() or qmp_job_complete(), or job_complete_sync(), which comes from qemu-img). The intention here is to prevent a user from invoking job-complete after the job has been cancelled. This should also apply to soft cancelling: After a mirror job has been soft-cancelled, the user should not be able to decide otherwise and have it complete as normal (i.e. pivoting to the target). - job_cancel(): Both functions are equivalent (see comment there), but we want to use job_is_cancelled(), because this shows that we call job_completed_txn_abort() only for force-cancelled jobs. (As explained for job_update_rc(), soft-cancelled jobs should be treated as if they have completed as normal.) Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006151940.214590-9-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-10-07jobs: Give Job.force_cancel more meaningHanna Reitz
We largely have two cancel modes for jobs: First, there is actual cancelling. The job is terminated as soon as possible, without trying to reach a consistent result. Second, we have mirror in the READY state. Technically, the job is not really cancelled, but it just is a different completion mode. The job can still run for an indefinite amount of time while it tries to reach a consistent result. We want to be able to clearly distinguish which cancel mode a job is in (when it has been cancelled). We can use Job.force_cancel for this, but right now it only reflects cancel requests from the user with force=true, but clearly, jobs that do not even distinguish between force=false and force=true are effectively always force-cancelled. So this patch has Job.force_cancel signify whether the job will terminate as soon as possible (force_cancel=true) or whether it will effectively remain running despite being "cancelled" (force_cancel=false). To this end, we let jobs that provide JobDriver.cancel() tell the generic job code whether they will terminate as soon as possible or not, and for jobs that do not provide that method we assume they will. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20211006151940.214590-7-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-10-07mirror: Drop s->syncedHanna Reitz
As of HEAD^, there is no meaning to s->synced other than whether the job is READY or not. job_is_ready() gives us that information, too. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20211006151940.214590-4-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-10-07mirror: Keep s->synced on errorHanna Reitz
An error does not take us out of the READY phase, which is what s->synced signifies. It does of course mean that source and target are no longer in sync, but that is what s->actively_sync is for -- s->synced never meant that source and target are in sync, only that they were at some point (and at that point we transitioned into the READY phase). The tangible problem is that we transition to READY once we are in sync and s->synced is false. By resetting s->synced here, we will transition from READY to READY once the error is resolved (if the job keeps running), and that transition is not allowed. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20211006151940.214590-3-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-09-29block: use int64_t instead of int in driver discard handlersVladimir Sementsov-Ogievskiy
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver discard handlers bytes parameter to int64_t. The only caller of all updated function is bdrv_co_pdiscard in block/io.c. It is already prepared to work with 64bit requests, but pass at most max(bs->bl.max_pdiscard, INT_MAX) to the driver. Let's look at all updated functions: blkdebug: all calculations are still OK, thanks to bdrv_check_qiov_request(). both rule_check and bdrv_co_pdiscard are 64bit blklogwrites: pass to blk_loc_writes_co_log which is 64bit blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pdiscard, OK copy-before-write: pass to bdrv_co_pdiscard which is 64bit and to cbw_do_copy_before_write which is 64bit file-posix: one handler calls raw_account_discard() is 64bit and both handlers calls raw_do_pdiscard(). Update raw_do_pdiscard, which pass to RawPosixAIOData::aio_nbytes, which is 64bit (and calls raw_account_discard()) gluster: somehow, third argument of glfs_discard_async is size_t. Let's set max_pdiscard accordingly. iscsi: iscsi_allocmap_set_invalid is 64bit, !is_byte_request_lun_aligned is 64bit. list.num is uint32_t. Let's clarify max_pdiscard and pdiscard_alignment. mirror_top: pass to bdrv_mirror_top_do_write() which is 64bit nbd: protocol limitation. max_pdiscard is alredy set strict enough, keep it as is for now. nvme: buf.nlb is uint32_t and we do shift. So, add corresponding limits to nvme_refresh_limits(). preallocate: pass to bdrv_co_pdiscard() which is 64bit. rbd: pass to qemu_rbd_start_co() which is 64bit. qcow2: calculations are still OK, thanks to bdrv_check_qiov_request(), qcow2_cluster_discard() is 64bit. raw-format: raw_adjust_offset() is 64bit, bdrv_co_pdiscard too. throttle: pass to bdrv_co_pdiscard() which is 64bit and to throttle_group_co_io_limits_intercept() which is 64bit as well. test-block-iothread: bytes argument is unused Great! Now all drivers are prepared to handle 64bit discard requests, or else have explicit max_pdiscard limits. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-11-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-09-29block: use int64_t instead of int in driver write_zeroes handlersVladimir Sementsov-Ogievskiy
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write_zeroes handlers bytes parameter to int64_t. The only caller of all updated function is bdrv_co_do_pwrite_zeroes(). bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s max_write_zeroes is limited to INT_MAX. So, updated functions all are safe, they will not get "bytes" larger than before. Still, let's look through all updated functions, and add assertions to the ones which are actually unprepared to values larger than INT_MAX. For these drivers also set explicit max_pwrite_zeroes limit. Let's go: blkdebug: calculations can't overflow, thanks to bdrv_check_qiov_request() in generic layer. rule_check() and bdrv_co_pwrite_zeroes() both have 64bit argument. blklogwrites: pass to blk_log_writes_co_log() with 64bit argument. blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pwrite_zeroes() which is OK copy-before-write: Calls cbw_do_copy_before_write() and bdrv_co_pwrite_zeroes, both have 64bit argument. file-posix: both handler calls raw_do_pwrite_zeroes, which is updated. In raw_do_pwrite_zeroes() calculations are OK due to bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes which is uint64_t. Check also where that uint64_t gets handed: handle_aiocb_write_zeroes_block() passes a uint64_t[2] to ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate() which takes off_t (and we compile to always have 64-bit off_t), as does handle_aiocb_write_zeroes_unmap. All look safe. gluster: bytes go to GlusterAIOCB::size which is int64_t and to glfs_zerofill_async works with off_t. iscsi: Aha, here we deal with iscsi_writesame16_task() that has uint32_t num_blocks argument and iscsi_writesame16_task() has uint16_t argument. Make comments, add assertions and clarify max_pwrite_zeroes calculation. iscsi_allocmap_() functions already has int64_t argument is_byte_request_lun_aligned is simple to update, do it. mirror_top: pass to bdrv_mirror_top_do_write which has uint64_t argument nbd: Aha, here we have protocol limitation, and NBDRequest::len is uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are OK for now. nvme: Again, protocol limitation. And no inherent limit for write-zeroes at all. But from code that calculates cdw12 it's obvious that we do have limit and alignment. Let's clarify it. Also, obviously the code is not prepared to handle bytes=0. Let's handle this case too. trace events already 64bit preallocate: pass to handle_write() and bdrv_co_pwrite_zeroes(), both 64bit. rbd: pass to qemu_rbd_start_co() which is 64bit. qcow2: offset + bytes and alignment still works good (thanks to bdrv_check_qiov_request()), so tail calculation is OK qcow2_subcluster_zeroize() has 64bit argument, should be OK trace events updated qed: qed_co_request wants int nb_sectors. Also in code we have size_t used for request length which may be 32bit. So, let's just keep INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and don't care. raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both 64bit. throttle: Both throttle_group_co_io_limits_intercept() and bdrv_co_pwrite_zeroes() are 64bit. vmdk: pass to vmdk_pwritev which is 64bit quorum: pass to quorum_co_pwritev() which is 64bit Hooray! At this point all block drivers are prepared to support 64bit write-zero requests, or have explicitly set max_pwrite_zeroes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-8-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: use <= rather than < in assertions relying on max_pwrite_zeroes] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-09-29block: use int64_t instead of uint64_t in driver write handlersVladimir Sementsov-Ogievskiy
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_\(aio\|co\)_pwritev\(_part\)\?' shows that's there three callers of driver function: bdrv_driver_pwritev() and bdrv_driver_pwritev_compressed() in block/io.c, both pass int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_save_vmstate() does bdrv_check_qiov_request(). Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_\(aio\|co\)_pwritev\(_part\)\?\s*=' | \ awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \ while read func; do git grep "$func(" | \ grep -v "$func(BlockDriverState"; done shows several callers: qcow2: qcow2_co_truncate() write at most up to @offset, which is checked in generic qcow2_co_truncate() by bdrv_check_request(). qcow2_co_pwritev_compressed_task() pass the request (or part of the request) that already went through normal write path, so it should be OK qcow: qcow_co_pwritev_compressed() pass int64_t, it's updated by this patch quorum: quorum_co_pwrite_zeroes() pass int64_t and int - OK throttle: throttle_co_pwritev_compressed() pass int64_t, it's updated by this patch vmdk: vmdk_co_pwritev_compressed() pass int64_t, it's updated by this patch Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-09-29block: use int64_t instead of uint64_t in driver read handlersVladimir Sementsov-Ogievskiy
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver read handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_\(aio\|co\)_preadv\(_part\)\?' shows that's there three callers of driver function: bdrv_driver_preadv() in block/io.c, passes int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_load_vmstate() does bdrv_check_qiov_request(). do_perform_cow_read() has uint64_t argument. And a lot of things in qcow2 driver are uint64_t, so converting it is big job. But we must not work with requests that don't satisfy bdrv_check_qiov_request(), so let's just assert it here. Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_\(aio\|co\)_preadv\(_part\)\?\s*=' | \ awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \ while read func; do git grep "$func(" | \ grep -v "$func(BlockDriverState"; done The only one such caller: QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1); ... ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0); in tests/unit/test-bdrv-drain.c, and it's OK obviously. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix typos] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-09-15block/mirror: fix NULL pointer dereference in mirror_wait_on_conflicts()Stefano Garzarella
In mirror_iteration() we call mirror_wait_on_conflicts() with `self` parameter set to NULL. Starting from commit d44dae1a7c we dereference `self` pointer in mirror_wait_on_conflicts() without checks if it is not NULL. Backtrace: Program terminated with signal SIGSEGV, Segmentation fault. #0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>) at ../block/mirror.c:172 172 self->waiting_for_op = op; [Current thread is 1 (Thread 0x7f0908931ec0 (LWP 380249))] (gdb) bt #0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>) at ../block/mirror.c:172 #1 0x00005610c5d9d631 in mirror_run (job=0x5610c76a2c00, errp=<optimized out>) at ../block/mirror.c:491 #2 0x00005610c5d58726 in job_co_entry (opaque=0x5610c76a2c00) at ../job.c:917 #3 0x00005610c5f046c6 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:173 #4 0x00007f0909975820 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /usr/lib64/libc.so.6 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001404 Fixes: d44dae1a7c ("block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210910124533.288318-1-sgarzare@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-07-20block/mirror: fix active mirror dead-lock in mirror_wait_on_conflictsVladimir Sementsov-Ogievskiy
It's possible that requests start to wait each other in mirror_wait_on_conflicts(). To avoid it let's use same technique as in block/io.c in bdrv_wait_serialising_requests_locked() / bdrv_find_conflicting_request(): don't wait on intersecting request if it is already waiting for some other request. For details of the dead-lock look at testIntersectingActiveIO() test-case which we actually fixing now. Fixes: d06107ade0ce74dc39739bac80de84b51ec18546 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210702211636.228981-4-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20block/mirror: set .co for active-write MirrorOp objectsVladimir Sementsov-Ogievskiy
This field is unused, but it very helpful for debugging. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210702211636.228981-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-05-14mirror: stop cancelling in-flight requests on non-force cancel in READYVladimir Sementsov-Ogievskiy
If mirror is READY than cancel operation is not discarding the whole result of the operation, but instead it's a documented way get a point-in-time snapshot of source disk. So, we should not cancel any requests if mirror is READ and force=false. Let's fix that case. Note, that bug that we have before this commit is not critical, as the only .bdrv_cancel_in_flight implementation is nbd_cancel_in_flight() and it cancels only requests waiting for reconnection, so it should be rare case. Fixes: 521ff8b779b11c394dbdc43f02e158dd99df308a Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210421075858.40197-1-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-04-30block: bdrv_append(): don't consume referenceVladimir Sementsov-Ogievskiy
We have too much comments for this feature. It seems better just don't do it. Most of real users (tests don't count) have to create additional reference. Drop also comment in external_snapshot_prepare: - bdrv_append doesn't "remove" old bs in common sense, it sounds strange - the fact that bdrv_append can fail is obvious from the context - the fact that we must rollback all changes in transaction abort is known (it's the direct role of abort) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-5-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-09mirror: Do not enter a paused job on completionMax Reitz
Currently, it is impossible to complete jobs on standby (i.e. paused ready jobs), but actually the only thing in mirror_complete() that does not work quite well with a paused job is the job_enter() at the end. If we make it conditional, this function works just fine even if the mirror job is paused. So technically this is a no-op, but obviously the intention is to accept block-job-complete even for jobs on standby, which we need this patch for first. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210409120422.144040-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-09mirror: Move open_backing_file to exit_commonMax Reitz
This is a graph change and therefore should be done in job-finalize (which is what invokes mirror_exit_common()). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210409120422.144040-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-03-29block/mirror: Fix mirror_top's permissionsMax Reitz
mirror_top currently shares all permissions, and takes only the WRITE permission (if some parent has taken that permission, too). That is wrong, though; mirror_top is a filter, so it should take permissions like any other filter does. For example, if the parent needs CONSISTENT_READ, we need to take that, too, and if it cannot share the WRITE permission, we cannot share it either. The exception is when mirror_top is used for active commit, where we cannot take CONSISTENT_READ (because it is deliberately unshared above the base node) and where we must share WRITE (so that it is shared for all images in the backing chain, so the mirror job can take it for the target BB). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210211172242.146671-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-03-08block/mirror: drop extra error propagation in commit_active_start()Vladimir Sementsov-Ogievskiy
Let's check return value of mirror_start_job to check for failure instead of local_err. Rename ret to job, as ret is usually integer variable. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-7-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-13Merge remote-tracking branch 'remotes/ericb/tags/pull-bitmaps-2021-02-12' ↵Peter Maydell
into staging bitmaps patches for 2021-02-12 - add 'transform' member to manipulate bitmaps across migration - work towards better error handling during bdrv_open # gpg: Signature made Fri 12 Feb 2021 23:19:39 GMT # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-bitmaps-2021-02-12: block: use return status of bdrv_append() block: return status from bdrv_append and friends qemu-iotests: 300: Add test case for modifying persistence of bitmap migration: dirty-bitmap: Allow control of bitmap persistence migration: dirty-bitmap: Use struct for alias map inner members Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-02-12block: use return status of bdrv_append()Vladimir Sementsov-Ogievskiy
Now bdrv_append returns status and we can drop all the local_err things around it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-02-12block/mirror: implement .cancel job handlerVladimir Sementsov-Ogievskiy
Cancel in-flight io on target to not waste the time. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-6-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-30block: Return depth level during bdrv_is_allocated_aboveEric Blake
When checking for allocation across a chain, it's already easy to count the depth within the chain at which the allocation is found. Instead of throwing that information away, return it to the caller. Existing callers only cared about allocated/non-allocated, but having a depth available will be used by NBD in the next patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-9-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>
2020-09-07block: Inline bdrv_co_block_status_from_*()Max Reitz
With bdrv_filter_bs(), we can easily handle this default filter behavior in bdrv_co_block_status(). blkdebug wants to have an additional assertion, so it keeps its own implementation, except bdrv_co_block_status_from_file() needs to be inlined there. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2020-09-07mirror: Deal with filtersMax Reitz
This includes some permission limiting (for example, we only need to take the RESIZE permission for active commits where the base is smaller than the top). base_overlay is introduced so we can query bdrv_is_allocated_above() on it - we cannot do that with base itself, because a filter's block_status is the same as its child node, so if there are filters on base, bdrv_is_allocated_above() on base would return information including base. Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to "target_backing_bs", because that is what it really refers to. Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-18block: Drop @child_class from bdrv_child_perm()Max Reitz
Implementations should decide the necessary permissions based on @role. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-35-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Pass BdrvChildRole to bdrv_child_perm()Max Reitz
For now, all callers pass 0 and no callee evaluates this value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Rename BdrvChildRole to BdrvChildClassMax Reitz
This structure nearly only contains parent callbacks for child state changes. It cannot really reflect a child's role, because different roles may overlap (as we will see when real roles are introduced), and because parents can have custom callbacks even when the child fulfills a standard role. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Mark commit, mirror, blkreplay as filtersMax Reitz
The commit, mirror, and blkreplay block nodes are filters, so they should be marked as such. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18mirror: Make sure that source and target size matchKevin Wolf
If the target is shorter than the source, mirror would copy data until it reaches the end of the target and then fail with an I/O error when trying to write past the end. If the target is longer than the source, the mirror job would complete successfully, but the target wouldn't actually be an accurate copy of the source image (it would contain some additional garbage at the end). Fix this by checking that both images have the same size when the job starts. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200511135825.219437-4-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30block-backend: Add flags to blk_truncate()Kevin Wolf
Now that node level interface bdrv_truncate() supports passing request flags to the block driver, expose this on the BlockBackend level, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-27mirror: Wait only for in-flight operationsKevin Wolf
mirror_wait_for_free_in_flight_slot() just picks a random operation to wait for. However, a MirrorOp is already in s->ops_in_flight when mirror_co_read() waits for free slots, so if not enough slots are immediately available, an operation can end up waiting for itself, or two or more operations can wait for each other to complete, which results in a hang. Fix this by adding a flag to MirrorOp that tells us if the request is already in flight (and therefore occupies slots that it will later free), and picking only such operations for waiting. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200326153628.4869-3-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-27Revert "mirror: Don't let an operation wait for itself"Kevin Wolf
This reverts commit 7e6c4ff792734e196c8ca82564c56b5e7c6288ca. The fix was incomplete as it only protected against requests waiting for themselves, but not against requests waiting for each other. We need a different solution. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200326153628.4869-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-26block/mirror: fix use after free of local_errVladimir Sementsov-Ogievskiy
local_err is used again in mirror_exit_common() after bdrv_set_backing_hd(), so we must zero it. Otherwise try to set non-NULL local_err will crash. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200324153630.11882-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-18mirror: Double-check immediately before replacingMax Reitz
There is no guarantee that we can still replace the node we want to replace at the end of the mirror job. Double-check by calling bdrv_recurse_can_replace(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-12-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18mirror: Don't let an operation wait for itselfKevin Wolf
mirror_wait_for_free_in_flight_slot() just picks a random operation to wait for. However, when mirror_co_read() waits for free slots, its MirrorOp is already in s->ops_in_flight, so if not enough slots are immediately available, an operation can end up waiting for itself to complete, which results in a hang. Fix this by passing the current MirrorOp and skipping this operation when picking an operation to wait for. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2020-02-18mirror: Store MirrorOp.co for debuggabilityKevin Wolf
If a coroutine is launched, but the coroutine pointer isn't stored anywhere, debugging any problems inside the coroutine is quite hard. Let's store the coroutine pointer of a mirror operation in MirrorOp to have it available in the debugger. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2019-10-28block: Add @exact parameter to bdrv_co_truncate()Max Reitz
We have two drivers (iscsi and file-posix) that (in some cases) return success from their .bdrv_co_truncate() implementation if the block device is larger than the requested offset, but cannot be shrunk. Some callers do not want that behavior, so this patch adds a new parameter that they can use to turn off that behavior. This patch just adds the parameter and lets the block/io.c and block/block-backend.c functions pass it around. All other callers always pass false and none of the implementations evaluate it, so that this patch does not change existing behavior. Future patches take care of that. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28mirror: Do not dereference invalid pointersMax Reitz
mirror_exit_common() may be called twice (if it is called from mirror_prepare() and fails, it will be called from mirror_abort() again). In such a case, many of the pointers in the MirrorBlockJob object will already be freed. This can be seen most reliably for s->target, which is set to NULL (and then dereferenced by blk_bs()). Cc: qemu-stable@nongnu.org Fixes: 737efc1eda23b904fbe0e66b37715fb0e5c3e58b Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191014153931.20699-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28Revert "mirror: Only mirror granularity-aligned chunks"Vladimir Sementsov-Ogievskiy
This reverts commit 9adc1cb49af8d4e54f57980b1eed5c0a4b2dafa6. "mirror: Only mirror granularity-aligned chunks" Since previous commit unaligned chunks are supported by do_sync_target_write. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191011090711.19940-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28block/mirror: support unaligned write in active mirrorVladimir Sementsov-Ogievskiy
Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up region in the dirty bitmap, which means that we may not copy some bytes and assume them copied, which actually leads to producing corrupted target. So 9adc1cb49af8d forced dirty bitmap granularity to be request_alignment for mirror-top filter, so we are not working with unaligned requests. However forcing large alignment obviously decreases performance of unaligned requests. This commit provides another solution for the problem: if unaligned padding is already dirty, we can safely ignore it, as 1. It's dirty, it will be copied by mirror_iteration anyway 2. It's dirty, so skipping it now we don't increase dirtiness of the bitmap and therefore don't damage "synchronicity" of the write-blocking mirror. If unaligned padding is not dirty, we just write it, no reason to touch dirty bitmap if we succeed (on failure we'll set the whole region ofcourse, but we loss "synchronicity" on failure anyway). Note: we need to disable dirty_bitmap, otherwise we will not be able to see in do_sync_target_write bitmap state before current operation. We may of course check dirty bitmap before the operation in bdrv_mirror_top_do_write and remember it, but we don't need active dirty bitmap for write-blocking mirror anyway. New code-path is unused until the following commit reverts 9adc1cb49af8d. Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191011090711.19940-5-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-28block/mirror: simplify do_sync_target_writeVladimir Sementsov-Ogievskiy
do_sync_target_write is called from bdrv_mirror_top_do_write after write/discard operation, all inside active_write/active_write_settle protecting us from mirror iteration. So the whole area is dirty for sure, no reason to examine dirty bitmap. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191011090711.19940-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-17block/dirty-bitmap: add bs linkVladimir Sementsov-Ogievskiy
Add bs field to BdrvDirtyBitmap structure. Drop BlockDriverState parameter from bitmap APIs where possible. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-3-vsementsov@virtuozzo.com [Rebased on top of block-copy. --js] Signed-off-by: John Snow <jsnow@redhat.com>