aboutsummaryrefslogtreecommitdiff
path: root/block/mirror.c
AgeCommit message (Collapse)Author
2016-04-22mirror: Workaround for unexpected iohandler events during completionFam Zheng
Commit 5a7e7a0ba moved mirror_exit to a BH handler but didn't add any protection against new requests that could sneak in just before the BH is dispatched. For example (assuming a code base at that commit): main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch aio_dispatch ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop qemu_iohandler_poll virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite main_loop_wait # 2 <snip> aio_dispatch aio_bh_poll (c) mirror_exit At (a) we know the BDS has no pending request. However, the same main_loop_wait call is going to dispatch iohandlers (EventNotifier events), which may lead to a new I/O from guest. So the invariant is already broken at (c). Data loss. Commit f3926945c8 made iohandler to use aio API. The order of virtio_queue_host_notifier_read and block_job_defer_to_main_loop within a main_loop_wait becomes unpredictable, and even worse, if the host notifier event arrives at the next main_loop_wait call, the unpredictable order between mirror_exit and virtio_queue_host_notifier_read is also a trouble. As shown below, this commit made the bug easier to trigger: - Bug case 1: main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch (qemu_aio_context) ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop aio_ctx_dispatch (iohandler_ctx) virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite main_loop_wait # 2 ... aio_dispatch aio_bh_poll (c) mirror_exit - Bug case 2: main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch (qemu_aio_context) ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop main_loop_wait # 2 ... aio_ctx_dispatch (iohandler_ctx) virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite aio_dispatch aio_bh_poll (c) mirror_exit In both cases, (b) breaks the invariant wanted by (a) and (c). Until then, the request loss has been silent. Later, 3f09bfbc7be added asserts at (c) to check the invariant (in bdrv_replace_in_backing_chain), and Max reported an assertion failure first visible there, by doing active committing while the guest is running bonnie++. 2.5 added bdrv_drained_begin at (a) to protect the dataplane case from similar problems, but we never realize the main loop bug until now. As a bandage, this patch disables iohandler's external events temporarily together with bs->ctx. Launchpad Bug: 1570134 Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-20mirror: Don't extend the last sub-chunkFam Zheng
The last sub-chunk is rounded up to the copy granularity in the target image, resulting in a larger size than the source. Add a function to clip the copied sectors to the end. This undoes the "wrong" changes to tests/qemu-iotests/109.out in e5b43573e28. The remaining two offset changes are okay. [ kwolf: Use DIV_ROUND_UP to calculate nb_chunks now ] Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>
2016-04-20block/mirror: Refresh stale bitmap iterator cacheMax Reitz
If the drive's dirty bitmap is dirtied while the mirror operation is running, the cache of the iterator used by the mirror code may become stale and not contain all dirty bits. This only becomes an issue if we are looking for contiguously dirty chunks on the drive. In that case, we can easily detect the discrepancy and just refresh the iterator if one occurs. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-20block/mirror: Revive dead yielding codeMax Reitz
mirror_iteration() is supposed to wait if the current chunk is subject to a still in-flight mirroring operation. However, it mixed checking this conflict situation with checking the dirty status of a chunk. A simplification for the latter condition (the first chunk encountered is always dirty) led to neglecting the former: We just skip the first chunk and thus never test whether it conflicts with an in-flight operation. To fix this, pull out the code which waits for in-flight operations on the first chunk of the range to be mirrored to settle. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-11mirror: Replace bdrv_drain(bs) with bdrv_co_drain(bs)Fam Zheng
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1459855253-5378-3-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-30block: Remove bdrv_(set_)enable_write_cache()Kevin Wolf
The only remaining users were block jobs (mirror and backup) which unconditionally enabled WCE on the BlockBackend of the target image. As these block jobs don't go through BlockBackend for their I/O requests, they aren't affected by this setting anyway but always get a writeback mode, so that call can be removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-22include/qemu/osdep.h: Don't include qapi/error.hMarkus Armbruster
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the Error typedef. Since then, we've moved to include qemu/osdep.h everywhere. Its file comment explains: "To avoid getting into possible circular include dependencies, this file should not include any other QEMU headers, with the exceptions of config-host.h, compiler.h, os-posix.h and os-win32.h, all of which are doing a similar job to this file and are under similar constraints." qapi/error.h doesn't do a similar job, and it doesn't adhere to similar constraints: it includes qapi-types.h. That's in excess of 100KiB of crap most .c files don't actually need. Add the typedef to qemu/typedefs.h, and include that instead of qapi/error.h. Include qapi/error.h in .c files that need it and don't get it now. Include qapi-types.h in qom/object.h for uint16List. Update scripts/clean-includes accordingly. Update it further to match reality: replace config.h by config-target.h, add sysemu/os-posix.h, sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h comment quoted above similarly. This reduces the number of objects depending on qapi/error.h from "all of them" to less than a third. Unfortunately, the number depending on qapi-types.h shrinks only a little. More work is needed for that one. Signed-off-by: Markus Armbruster <armbru@redhat.com> [Fix compilation without the spice devel packages. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-29mirror: Add mirror_wait_for_ioFam Zheng
The three lines are duplicated a number of times now, refactor a function. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1454637630-10585-3-git-send-email-famz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29mirror: Rewrite mirror_iterationFam Zheng
The "pnum < nb_sectors" condition in deciding whether to actually copy data is unnecessarily strict, and the qiov initialization is unnecessarily for bdrv_aio_write_zeroes and bdrv_aio_discard. Rewrite mirror_iteration to fix both flaws. The output of iotests 109 is updated because we now report the offset and len slightly differently in mirroring progress. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1454637630-10585-2-git-send-email-famz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-02block: Add "file" output parameter to block status query functionsFam Zheng
The added parameter can be used to return the BDS pointer which the valid offset is referring to. Its value should be ignored unless BDRV_BLOCK_OFFSET_VALID in ret is set. Until block drivers fill in the right value, let's clear it explicitly right before calling .bdrv_get_block_status. The "bs->file" condition in bdrv_co_get_block_status is kept now to keep iotest case 102 passing, and will be fixed once all drivers return the right file pointer. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1453780743-16806-2-git-send-email-famz@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-01-20block: Clean up includesPeter Maydell
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-22block/mirror: replace IOV_MAX with blk_get_max_iov()Stefan Hajnoczi
Use blk_get_max_iov() instead of hardcoding IOV_MAX, which may not apply to all BlockDrivers. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-18block: Allow references for backing filesKevin Wolf
For bs->file, using references to existing BDSes has been possible for a while already. This patch enables the same for bs->backing. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-18mirror: Error out when a BDS would get two BBsKevin Wolf
bdrv_replace_in_backing_chain() asserts that not both old and new BlockDdriverState have a BlockBackend attached to them because both would have to end up pointing to the new BDS and we don't support more than one BB per BDS yet. Before we can safely allow references to existing nodes as backing files, we need to make sure that even if a backing file has a BB on it, this doesn't crash qemu. There are probably also some cases with the 'replaces' option set where drive-mirror could fail this assertion today. They are fixed with this error check as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-12-02mirror: Quiesce source during "mirror_exit"Fam Zheng
With dataplane, the ioeventfd events could be dispatched after mirror_run releases the dirty bitmap, but before mirror_exit actually does the device switch, because the iothread will still be running, and it will cause silent data loss. Fix this by adding a bdrv_drained_begin/end pair around the window, so that no new external request will be handled. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>
2015-11-12blockjob: Introduce reference count and fix reference to job->bsFam Zheng
Add reference count to block job, meanwhile move the ownership of the reference to job->bs from the caller (which is released in two completion callbacks) to the block job itself. It is necessary for block_job_complete_sync to work, because block job shouldn't live longer than its bs, as asserted in bdrv_delete. Now block_job_complete_sync can be simplified. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1446765200-3054-6-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11mirror: block all operations on the target image during the jobAlberto Garcia
There's nothing preventing the target image from being used by other operations during the 'drive-mirror' job, so we should block them all until the job is done. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 82b88fd04cde918a08a6f9a4ab85626d7fd7b502.1446475331.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-10-23block: Move I/O status and error actions into BBMax Reitz
These options are only relevant for the user of a whole BDS tree (like a guest device or a block job) and should thus be moved into the BlockBackend. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-16block: Add and use bdrv_replace_in_backing_chain()Kevin Wolf
This cleans up the mess we left behind in the mirror code after the previous patch. Instead of using bdrv_swap(), just change pointers. The interface change of the mirror job that callers must consider is that after job completion, their local BDS pointers still point to the same node now. qemu-img must change its code accordingly (which makes it easier to understand); the other callers stays unchanged because after completion they don't do anything with the BDS, but just with the job, and the job is still owned by the source BDS. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16blockjob: Store device name at job creationKevin Wolf
Some block jobs change the block device graph on completion. This means that the device that owns the job and originally was addressed with its device name may no longer be what the corresponding BlockBackend points to. Previously, the effects of bdrv_swap() ensured that the job was (at least partially) transferred to the target image. Events that contain the device name could still use bdrv_get_device_name(job->bs) and get the same result. After removing bdrv_swap(), this won't work any more. Instead, save the device name at job creation and use that copy for QMP events and anything else identifying the job. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16block: Manage backing file references in bdrv_set_backing_hd()Kevin Wolf
This simplifies the code somewhat, especially when dropping whole backing file subchains. The exception is the mirroring code that does adventurous things with bdrv_swap() and in order to keep it working, I had to duplicate most of bdrv_set_backing_hd() locally. We'll get rid again of this ugliness shortly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-16block: Convert bs->backing_hd to BdrvChildKevin Wolf
This is the final step in converting all of the BlockDriverState pointers that block drivers use to BdrvChild. After this patch, bs->children contains the full list of child nodes that are referenced by a given BDS, and these children are only referenced through BdrvChild, so that updating the pointer in there is enough for changing edges in the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-12block: switch from g_slice allocator to mallocPaolo Bonzini
Simplify memory allocation by sticking with a single API. GSlice is not that fast anyway (tcmalloc/jemalloc are better). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-01block: mirror - fix full sync mode when target does not support zero initJeff Cody
During mirror, if the target device does not support zero init, a mirror may result in a corrupted image for sync="full" mode. This is due to how the initial dirty bitmap is set up prior to copying data - we did not mark sectors as dirty that are unallocated. This means those unallocated sectors are skipped over on the target, and for a device without zero init, invalid data may reside in those holes. If both of the following conditions are true, then we will explicitly mark all sectors as dirty: 1.) sync = "full" 2.) bdrv_has_zero_init(target) == false If the target does support zero init, but a target image is passed in with data already present (i.e. an "existing" image), it is assumed the data present in the existing image is valid data for those sectors. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 91ed4bc5bda7e2b09eb508b07c83f4071fe0b3c9.1443705220.git.jcody@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2015-09-02block: more check for replaced nodeWen Congyang
We use mirror+replace to fix quorum's broken child. bs/s->common.bs is quorum, and to_replace is the broken child. The new child is target_bs. Without this patch, the replace node can be any node, and it can be top BDS with BB, or another quorum's child. We just check if the broken child is part of the quorum BDS in this patch. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Message-id: 55A86486.1000404@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-08-14mirror: Fix coroutine reentranceKevin Wolf
This fixes a regression introduced by commit dcfb3beb ("mirror: Do zero write on target if sectors not allocated"), which was reported to cause aborts with the message "Co-routine re-entered recursively". The cause for this bug is the following code in mirror_iteration_done(): if (s->common.busy) { qemu_coroutine_enter(s->common.co, NULL); } This has always been ugly because - unlike most places that reenter - it doesn't have a specific yield that it pairs with, but is more uncontrolled. What we really mean here is "reenter the coroutine if it's in one of the four explicit yields in mirror.c". This used to be equivalent with s->common.busy because neither mirror_run() nor mirror_iteration() call any function that could yield. However since commit dcfb3beb this doesn't hold true any more: bdrv_get_block_status_above() can yield. So what happens is that bdrv_get_block_status_above() wants to take a lock that is already held, so it adds itself to the queue of waiting coroutines and yields. Instead of being woken up by the unlock function, however, it gets woken up by mirror_iteration_done(), which is obviously wrong. In most cases the code actually happens to cope fairly well with such cases, but in this specific case, the unlock must already have scheduled the coroutine for wakeup when mirror_iteration_done() reentered it. And then the coroutine happened to process the scheduled restarts and tried to reenter itself recursively. This patch fixes the problem by pairing the reenter in mirror_iteration_done() with specific yields instead of abusing s->common.busy. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1439455310-11263-1-git-send-email-kwolf@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2015-08-06block/mirror: limit qiov to IOV_MAX elementsStefan Hajnoczi
If mirror has more free buffers than IOV_MAX, preadv(2)/pwritev(2) EINVAL failures may be encountered. It is possible to trigger this by setting granularity to a low value like 8192. This patch stops appending chunks once IOV_MAX is reached. The spurious EINVAL failure can be reproduced with a qcow2 image file and the following QMP invocation: qmp.command('drive-mirror', device='virtio0', target='/tmp/r7.s1', granularity=8192, sync='full', mode='absolute-paths', format='raw') While the guest is running dd if=/dev/zero of=/var/tmp/foo oflag=direct bs=4k. Cc: Jeff Cody <jcody@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1435761950-26714-1-git-send-email-stefanha@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2015-07-22mirror: Speed up bitmap initial scanningFam Zheng
Limiting to sectors_per_chunk for each bdrv_is_allocated_above is slow, because the underlying protocol driver would issue much more queries than necessary. We should coalesce the query. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: <1436413678-7114-4-git-send-email-famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-14mirror: correct buf_sizeWen Congyang
If bus_size is less than 0, the command fails. If buf_size is 0, use DEFAULT_MIRROR_BUF_SIZE. If buf_size % granularity is not 0, mirror_free_init() will do dangerous things. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 5555A588.3080907@cn.fujitsu.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2015-07-14block/mirror: Sleep periodically during bitmap scanningFam Zheng
Before, we only yield after initializing dirty bitmap, where the QMP command would return. That may take very long, and guest IO will be blocked. Add sleep points like the later mirror iterations. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1431486673-19280-1-git-send-email-famz@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2015-07-07blockjob: add block_job_release functionTing Wang
There is job resource leak in function mirror_start_job, although bdrv_create_dirty_bitmap is unlikely failed. Add block_job_release for each release when needed. Signed-off-by: Ting Wang <kathy.wangting@huawei.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1435311455-56048-1-git-send-email-kathy.wangting@huawei.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-02mirror: Do zero write on target if sectors not allocatedFam Zheng
If guest discards a source cluster, mirroring with bdrv_aio_readv is overkill. Some protocols do zero upon discard, where it's best to use bdrv_aio_write_zeroes, otherwise, bdrv_aio_discard will be enough. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-02qmp: Add optional bool "unmap" to drive-mirrorFam Zheng
If specified as "true", it allows discarding on target sectors where source is not allocated. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-02qapi: Rename 'dirty-bitmap' mode to 'incremental'John Snow
If we wish to make differential backups a feature that's easy to access, it might be pertinent to rename the "dirty-bitmap" mode to "incremental" to make it clear what /type/ of backup the dirty-bitmap is helping us perform. This is an API breaking change, but 2.4 has not yet gone live, so we have this flexibility. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1433463642-21840-2-git-send-email-jsnow@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-06-22Include qapi/qmp/qerror.h exactly where neededMarkus Armbruster
In particular, don't include it into headers. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-06-22qerror: Clean up QERR_ macros to expand into a single stringMarkus Armbruster
These macros expand into error class enumeration constant, comma, string. Unclean. Has been that way since commit 13f59ae. The error class is always ERROR_CLASS_GENERIC_ERROR since the previous commit. Clean up as follows: * Prepend every use of a QERR_ macro by ERROR_CLASS_GENERIC_ERROR, and delete it from the QERR_ macro. No change after preprocessing. * Rewrite error_set(ERROR_CLASS_GENERIC_ERROR, ...) into error_setg(...). Again, no change after preprocessing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-04-28block/mirror: Always call block_job_sleep_ns()Max Reitz
The mirror block job is trying to take a clever shortcut if delay_ns is 0 and skips block_job_sleep_ns() in that case. But that function must be called in every block job iteration, because otherwise it is for example impossible to pause the job. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28block: Ensure consistent bitmap function prototypesJohn Snow
We often don't need the BlockDriverState for functions that operate on bitmaps. Remove it. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-15-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28qmp: Add support of "dirty-bitmap" sync mode for drive-backupJohn Snow
For "dirty-bitmap" sync mode, the block job will iterate through the given dirty bitmap to decide if a sector needs backup (backup all the dirty clusters and skip clean ones), just as allocation conditions of "top" sync mode. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-11-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-removeJohn Snow
The new command pair is added to manage a user created dirty bitmap. The dirty bitmap's name is mandatory and must be unique for the same device, but different devices can have bitmaps with the same names. The granularity is an optional field. If it is not specified, we will choose a default granularity based on the cluster size if available, clamped to between 4K and 64K to mirror how the 'mirror' code was already choosing granularity. If we do not have cluster size info available, we choose 64K. This code has been factored out into a helper shared with block/mirror. This patch also introduces the 'block_dirty_bitmap_lookup' helper, which takes a device name and a dirty bitmap name and validates the lookup, returning NULL and setting errp if there is a problem with either field. This helper will be re-used in future patches in this series. The types added to block-core.json will be re-used in future patches in this series, see: 'qapi: Add transaction support to block-dirty-bitmap-{add, enable, disable}' Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1429314609-29776-5-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28qmp: Ensure consistent granularity typeJohn Snow
We treat this field with a variety of different types everywhere in the code. Now it's just uint32_t. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-4-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28qapi: Add optional field "name" to block dirty bitmapFam Zheng
This field will be set for user created dirty bitmap. Also pass in an error pointer to bdrv_create_dirty_bitmap, so when a name is already taken on this BDS, it can report an error message. This is not global check, two BDSes can have dirty bitmap with a common name. Implemented bdrv_find_dirty_bitmap to find a dirty bitmap by name, will be used later when other QMP commands want to reference dirty bitmap by name. Add bdrv_dirty_bitmap_make_anon. This unsets the name of dirty bitmap. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-3-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28blockjob: Update function name in commentsFam Zheng
Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1428069921-2957-5-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28blockjob: Allow nested pauseFam Zheng
This patch changes block_job_pause to increase the pause counter and block_job_resume to decrease it. The counter will allow calling block_job_pause/block_job_resume unconditionally on a job when we need to suspend the IO temporarily. From now on, each block_job_resume must be paired with a block_job_pause to keep the counter balanced. The user pause from QMP or HMP will only trigger block_job_pause once until it's resumed, this is achieved by adding a user_paused flag in BlockJob. One occurrence of block_job_resume in mirror_complete is replaced with block_job_enter which does what is necessary. In block_job_cancel, the cancel flag is good enough to instruct coroutines to quit loop, so use block_job_enter to replace the unpaired block_job_resume. Upon block job IO error, user is notified about the entering to the pause state, so this pause belongs to user pause, set the flag accordingly and expect a matching QMP resume. [Extended doc comments as suggested by Paolo Bonzini <pbonzini@redhat.com>. --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1428069921-2957-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-23block: mirror - change string allocation to 2-bytesJeff Cody
The backing_filename string in mirror_run() is only used to check for a NULL string, so we don't need to allocate 1024 bytes (or, later, PATH_MAX bytes), when we only need to copy the first 2 characters. We technically only need 1 byte, as we are just checking for NULL, but since backing_filename[] is populated by bdrv_get_backing_filename(), a string size of 1 will always only return '\0'; Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13block: fix spoiling all dirty bitmaps by mirror and migrationVladimir Sementsov-Ogievskiy
Mirror and migration use dirty bitmaps for their purposes, and since commit [block: per caller dirty bitmap] they use their own bitmaps, not the global one. But they use old functions bdrv_set_dirty and bdrv_reset_dirty, which change all dirty bitmaps. Named dirty bitmaps series by Fam and Snow are affected: mirroring and migration will spoil all (not related to this mirroring or migration) named dirty bitmaps. This patch fixes this by adding bdrv_set_dirty_bitmap and bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent such mistakes in future, old functions bdrv_(set,reset)_dirty are made static, for internal block usage. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com> CC: John Snow <jsnow@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Denis V. Lunev <den@openvz.org> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417081246-3593-1-git-send-email-vsementsov@parallels.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2014-11-03block: let mirror blockjob run in BDS AioContextStefan Hajnoczi
The mirror block job must run in the BlockDriverState AioContext so that it works with dataplane. Acquire the AioContext in blockdev.c so starting the block job is safe. Note that to_replace is treated separately from other BlockDriverStates in that it does not need to be in the same AioContext. Explicitly acquire/release to_replace's AioContext when accessing it. The completion code in block/mirror.c must perform BDS graph manipulation and bdrv_reopen() from the main loop. Use block_job_defer_to_main_loop() to achieve that. The bdrv_drain_all() call is not allowed outside the main loop since it could lead to lock ordering problems. Use bdrv_drain(bs) instead because we have acquired the AioContext so nothing else can sneak in I/O. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1413889440-32577-10-git-send-email-stefanha@redhat.com
2014-11-03block/mirror: Improve progress reportMax Reitz
Instead of taking the total length of the block device as the block job's length, use the number of dirty sectors. The progress is now the number of sectors mirrored to the target block device. Note that this may result in the job's length increasing during operation, which is however in fact desirable. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414159063-25977-8-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-10-20block: Rename BlockDriverCompletionFunc to BlockCompletionFuncMarkus Armbruster
I'll use it with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20block: Eliminate BlockDriverState member device_name[]Markus Armbruster
device_name[] can become non-empty only in bdrv_new_root() and bdrv_move_feature_fields(). The latter is used only to undo damage done by bdrv_swap(). The former is called only by blk_new_with_bs(). Therefore, when a BlockDriverState's device_name[] is non-empty, then it's been created with a BlockBackend, and vice versa. Furthermore, blk_new_with_bs() keeps the two names equal. Therefore, device_name[] is redundant. Eliminate it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>