aboutsummaryrefslogtreecommitdiff
path: root/include/block
AgeCommit message (Collapse)Author
2016-07-05nbd: Allow larger requestsEric Blake
The NBD layer was breaking up request at a limit of 2040 sectors (just under 1M) to cater to old qemu-nbd. But the server limit was raised to 32M in commit 2d8214885 to match the kernel, more than three years ago; and the upstream NBD Protocol is proposing documentation that without any explicit communication to state otherwise, a client should be able to safely assume that a 32M transaction will work. It is time to rely on the larger sizing, and any downstream distro that cares about maximum interoperability to older qemu-nbd servers can just tweak the value of #define NBD_MAX_SECTORS. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-stable@nongnu.org Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-20blockjob: add AioContext attached callbackStefan Hajnoczi
Block jobs that use additional BDSes or event loop resources need a callback to get their affairs in order when the AioContext is switched. Simple block jobs don't need an attach callback, they automatically work thanks to the generic attach/detach notifiers that this patch adds. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466096189-6477-7-git-send-email-stefanha@redhat.com
2016-06-20block: use safe iteration over AioContext notifiersStefan Hajnoczi
It's possible that an AioContext notifier user was close to finishing when .detach_aio_context() or .attached_aio_context() is called. In that case they may call bdrv_remove_aio_context_notifier() during the callback. Use safe iteration to avoid crashing when the notifier list is modified during iteration. We must not only handle the case where the current aio notifier is removed during a callback but also the one where any other aio notifier is removed. The next patch adds an AioContext notifier for block jobs and they really could be terminating just as .detach_aio_context() is invoked. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466096189-6477-6-git-send-email-stefanha@redhat.com
2016-06-20blockjob: add pause pointsStefan Hajnoczi
Block jobs are coroutines that usually perform I/O but sometimes also sleep or yield. Currently only sleeping or yielded block jobs can be paused. This means jobs that do not sleep or yield (using block_job_yield()) are unaffected by block_job_pause(). Add block_job_pause_point() so that block jobs can mark quiescent points that are suitable for pausing. This solves the problem that it can take a block job a long time to pause if it is performing a long series of I/O operations. Transitioning to paused state involves a .pause()/.resume() callback. These callbacks are used to ensure that I/O and event loop activity has ceased while the job is at a pause point. Note that this patch introduces a stricter pause state than previously. The job->busy flag was incorrectly documented as a quiescent state without I/O pending. This is violated by any job that has I/O pending across sleep or block_job_yield(), like the mirror block job. [Add missing block_job_should_pause() check to avoid deadlock after job->driver->pause() in block_job_pause_point(). --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466096189-6477-4-git-send-email-stefanha@redhat.com
2016-06-20blockjob: rename block_job_is_paused()Stefan Hajnoczi
The block_job_is_paused() function name is not great because callers only use it to determine whether pausing has been requested. Rename it to highlight those semantics and remove it from the public header file as there are no external callers. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1466096189-6477-3-git-send-email-stefanha@redhat.com
2016-06-16nbd: Avoid magic number for NBD max name sizeEric Blake
Declare a constant and use that when determining if an export name fits within the constraints we are willing to support. Note that upstream NBD recently documented that clients MUST support export names of 256 bytes (not including trailing NUL), and SHOULD support names up to 4096 bytes. 4096 is a bit big (we would lose benefits of stack-allocation of a name array), and we already have other limits in place (for example, qcow2 snapshot names are clamped around 1024). So for now, just stick to the required minimum, as that's easier to audit than a full-scale support for larger names. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1463006384-7734-12-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16nbd: simplify the nbd_request and nbd_reply structsPaolo Bonzini
These structs are never used to represent the bytes that go over the network. The big-endian network data is built into a uint8_t array in nbd_{receive,send}_{request,reply}. Remove the unused magic field, reorder the struct to avoid holes, and remove the packed attribute. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16block/mirror: Fix target backing BDSMax Reitz
Currently, we are trying to move the backing BDS from the source to the target in bdrv_replace_in_backing_chain() which is called from mirror_exit(). However, mirror_complete() already tries to open the target's backing chain with a call to bdrv_open_backing_file(). First, we should only set the target's backing BDS once. Second, the mirroring block job has a better idea of what to set it to than the generic code in bdrv_replace_in_backing_chain() (in fact, the latter's conditions on when to move the backing BDS from source to target are not really correct). Therefore, remove that code from bdrv_replace_in_backing_chain() and leave it to mirror_complete(). Depending on what kind of mirroring is performed, we furthermore want to use different strategies to open the target's backing chain: - If blockdev-mirror is used, we can assume the user made sure that the target already has the correct backing chain. In particular, we should not try to open a backing file if the target does not have any yet. - If drive-mirror with mode=absolute-paths is used, we can and should reuse the already existing chain of nodes that the source BDS is in. In case of sync=full, no backing BDS is required; with sync=top, we just link the source's backing BDS to the target, and with sync=none, we use the source BDS as the target's backing BDS. We should not try to open these backing files anew because this would lead to two BDSs existing per physical file in the backing chain, and we would like to avoid such concurrent access. - If drive-mirror with mode=existing is used, we have to use the information provided in the physical image file which means opening the target's backing chain completely anew, just as it has been done already. If the target's backing chain shares images with the source, this may lead to multiple BDSs per physical image file. But since we cannot reliably ascertain this case, there is nothing we can do about it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160610185750.30956-3-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-16block: Remove bs->zero_beyond_eofKevin Wolf
It is always true for open images now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16block: Make bdrv_load/save_vmstate coroutine_fnsKevin Wolf
This allows drivers to share code between normal I/O and vmstate accesses. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16block: Make .bdrv_load_vmstate() vectoredKevin Wolf
This brings it in line with .bdrv_save_vmstate(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16block: Introduce bdrv_preadv()Kevin Wolf
We already have a byte-based bdrv_pwritev(), but the read counterpart was still missing. This commit adds it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16block: Byte-based bdrv_co_do_copy_on_readv()Kevin Wolf
In a first step to convert the common I/O path to work on bytes rather than sectors, this converts the copy-on-read logic that is used by bdrv_aligned_preadv(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-16block: Assert that flags are in rangeEric Blake
Add a new BDRV_REQ_MASK constant, and use it to make sure that caller flags are always valid. Tested with 'make check' and with qemu-iotests on both '-raw' and '-qcow2'; the only failure turned up was fixed in the previous commit. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08block: Kill bdrv_co_write_zeroes()Eric Blake
Now that all drivers have been converted to a byte interface, we no longer need a sector interface. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08block: Switch bdrv_write_zeroes() to byte interfaceEric Blake
Rename to bdrv_pwrite_zeroes() to let the compiler ensure we cater to the updated semantics. Do the same for bdrv_co_write_zeroes(). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08block: Add .bdrv_co_pwrite_zeroes()Eric Blake
Update bdrv_co_do_write_zeroes() to be byte-based, and select between the new byte-based bdrv_co_pwrite_zeroes() or the old bdrv_co_write_zeroes(). The next patches will convert drivers, then remove the old interface. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-08block: Track write zero limits in bytesEric Blake
Another step towards removing sector-based interfaces: convert the maximum write and minimum alignment values from sectors to bytes. Rename the variables to let the compiler check that all users are converted to the new semantics. The maximum remains an int as long as BDRV_REQUEST_MAX_SECTORS is constrained by INT_MAX (this means that we can't even support a 2G write_zeroes, but just under it) - changing operation lengths to unsigned or to 64-bits is a much bigger audit, and debatable if we even want to do it (since at the core, a 32-bit platform will still have ssize_t as its underlying limit on write()). Meanwhile, alignment is changed to 'uint32_t', since it makes no sense to have an alignment larger than the maximum write, and less painful to use an unsigned type with well-defined behavior in bit operations than to have to worry about what happens if a driver mistakenly supplies a negative alignment. Add an assert that no one was trying to use sectors to get a write zeroes larger than 2G, and therefore that a later conversion to bytes won't be impacted by keeping the limit at 32 bits. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-06-07block: Move BlockRequest type to io.cEric Blake
I was thrown by the fact that the public type BlockRequest had an anonymous union, but no obvious discriminator. Turns out that the only client of the second branch of the union was code internal to io.c, now that commit 91c6e4b killed public multiwrite, so move it into io.c and improve the comments. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1463699150-19445-1-git-send-email-eblake@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2016-06-07iostatus: fix comments for block_job_iostatus_resetChanglong Xie
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Message-id: 1464600491-23340-1-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-06-07block/io: Remove unused bdrv_aio_write_zeroes()Kevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1464599852-15392-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-25blockjob: Remove BlockJob.bsKevin Wolf
There is a single remaining user in qemu-img, and another one in a test case, both of which can be trivially converted to using BlockJob.blk instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-05-25backup: Use BlockBackend for I/OKevin Wolf
This changes the backup block job to use the job's BlockBackend for performing its I/O. job->bs isn't used by the backup code any more afterwards. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25stream: Use BlockBackend for I/OKevin Wolf
This changes the streaming block job to use the job's BlockBackend for performing the COR reads. job->bs isn't used by the streaming code any more afterwards. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25block: Convert block job core to BlockBackendKevin Wolf
This adds a new BlockBackend field to the BlockJob struct, which coexists with the BlockDriverState while converting the individual jobs. When creating a block job, a new BlockBackend is created on top of the given BlockDriverState, and it is destroyed when the BlockJob ends. The reference to the BDS is now held by the BlockBackend instead of calling bdrv_ref/unref manually. We have to be careful when we use bdrv_replace_in_backing_chain() in block jobs because this changes the BDS that job->blk points to. At the moment block jobs are too tightly coupled with their BDS, so that moving a job to another BDS isn't easily possible; therefore, we need to just manually undo this change afterwards. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25block: Cancel jobs first in bdrv_close_all()Kevin Wolf
So far, bdrv_close_all() first removed all root BlockDriverStates of BlockBackends and monitor owned BDSes, and then assumed that the remaining BDSes must be related to jobs and cancelled these jobs. This order doesn't work that well any more when block jobs use BlockBackends internally because then they will lose their BDS before being cancelled. This patch changes bdrv_close_all() to first cancel all jobs and then remove all root BDSes from the remaining BBs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25block: keep a list of block jobsAlberto Garcia
The current way to obtain the list of existing block jobs is to iterate over all root nodes and check which ones own a job. Since we want to be able to support block jobs in other nodes as well, this patch keeps a list of jobs that is updated every time one is created or destroyed. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25block: Fix reconfiguring graph with drained nodesKevin Wolf
When changing the BlockDriverState that a BdrvChild points to while the node is currently drained, we must call the .drained_end() parent callback. Conversely, when this means attaching a new node that is already drained, we need to call .drained_begin(). bdrv_root_attach_child() takes now an opaque parameter, which is needed because the callbacks must also be called if we're attaching a new child to the BlockBackend when the root node is already drained, and they need a way to identify the BlockBackend. Previously, child->opaque was set too late and the callbacks would still see it as NULL. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25block: Make bdrv_open() return a BDSMax Reitz
There are no callers to bdrv_open() or bdrv_open_inherit() left that pass a pointer to a non-NULL BDS pointer as the first argument of these functions, so we can finally drop that parameter and just make them return the new BDS. Generally, the following pattern is applied: bs = NULL; ret = bdrv_open(&bs, ..., &local_err); if (ret < 0) { error_propagate(errp, local_err); ... } by bs = bdrv_open(..., errp); if (!bs) { ret = -EINVAL; ... } Of course, there are only a few instances where the pattern is really pure. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25block: Drop bdrv_new_root()Max Reitz
It is unused now, so we may just as well drop it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25block: Fix bdrv_next() memory leakKevin Wolf
The bdrv_next() users all leaked the BdrvNextIterator after completing the iteration. Simply changing bdrv_next() to free the iterator before returning NULL at the end of list doesn't work because some callers exit the loop before looking at all BDSes. This patch moves the BdrvNextIterator from the heap to the stack of the caller and switches to a bdrv_first()/bdrv_next() interface for initialising the iterator. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-19block: Remove BlockDriverState.blkKevin Wolf
This patch removes the remaining users of bs->blk, which will allow us to have multiple BBs on top of a single BDS. In the meantime, all checks that are currently in place to prevent the user from creating such setups can be switched to bdrv_has_blk() instead of accessing BDS.blk. Future patches can allow them and e.g. enable users to mirror to a block device that already has a BlockBackend on it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19block: Avoid bs->blk in bdrv_next()Kevin Wolf
We need to introduce a separate BdrvNextIterator struct that can keep more state than just the current BDS in order to avoid using the bs->blk pointer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19block: Remove bdrv_aio_multiwrite()Kevin Wolf
Since virtio-blk implements request merging itself these days, the only remaining users are test cases for the function. That doesn't make the function exactly useful any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-05-19blockjob: Don't set iostatus of targetKevin Wolf
When block job errors were introduced, we assigned the iostatus of the target BDS "just in case". The field has never been accessible for the user because the target isn't listed in query-block. Before we can allow the user to have a second BlockBackend on the target, we need to clean this up. If anything, we would want to set the iostatus for the internal BB of the job (which we can always do later), but certainly not for a separate BB which the job doesn't even use. As a nice side effect, this gets us rid of another bs->blk use. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19block: User BdrvChild callback for device nameKevin Wolf
In order to get rid of bs->blk for bdrv_get_device_name() and bdrv_get_device_or_node_name(), ask all parents for their name and simply pick the first one. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19block: Use BdrvChild callbacks for change_media/resizeKevin Wolf
We want to get rid of BlockDriverState.blk in order to allow multiple BlockBackends per BDS. Converting the device callbacks in block.c (which assume a single BlockBackend) to per-child callbacks gets us rid of the first few instances. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19block: Decouple throttling from BlockDriverStateKevin Wolf
This moves the throttling related part of the BDS life cycle management to BlockBackend. The throttling group reference is now kept even when no medium is inserted. With this commit, throttling isn't disabled and then re-enabled any more during graph reconfiguration. This fixes the temporary breakage of I/O throttling when used with live snapshots or block jobs that manipulate the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19block: Drain throttling queue with BdrvChild callbackKevin Wolf
This removes the last part of I/O throttling from block/io.c and moves it to the BlockBackend. Instead of having knowledge about throttling inside io.c, we can call a BdrvChild callback .drained_begin/end, which happens to drain the throttled requests for BlockBackend parents. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19block: Introduce BdrvChild.opaqueKevin Wolf
BlockBackends use it to get a back pointer from BdrvChild to BlockBackend in any BdrvChildRole callbacks. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19block: Move I/O throttling configuration functions to BlockBackendKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19block: Move actual I/O throttling to BlockBackendKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19block: Move throttling fields from BDS to BBKevin Wolf
This patch changes where the throttling state is stored (used to be the BlockDriverState, now it is the BlockBackend), but it doesn't actually make it a BB level feature yet. For example, throttling is still disabled when the BDS is detached from the BB. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19block: Convert throttle_group_get_name() to BlockBackendKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19block: throttle-groups: Use BlockBackend pointers internallyKevin Wolf
As a first step towards moving I/O throttling to the BlockBackend level, this patch changes all pointers in struct ThrottleGroup from referencing a BlockDriverState to referencing a BlockBackend. This change is valid because we made sure that throttling can only be enabled on BDSes which have a BB attached. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-12quorum: implement bdrv_add_child() and bdrv_del_child()Wen Congyang
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Message-id: 1462865799-19402-3-git-send-email-xiecl.fnst@cn.fujitsu.com Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12Add new block driver interface to add/delete a BDS's childWen Congyang
In some cases, we want to take a quorum child offline, and take another child online. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1462865799-19402-2-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12block: Honor BDRV_REQ_FUA during write_zeroesEric Blake
The block layer has a couple of cases where it can lose Force Unit Access semantics when writing a large block of zeroes, such that the request returns before the zeroes have been guaranteed to land on underlying media. SCSI does not support FUA during WRITESAME(10/16); FUA is only supported if it falls back to WRITE(10/16). But where the underlying device is new enough to not need a fallback, it means that any upper layer request with FUA semantics was silently ignoring BDRV_REQ_FUA. Conversely, NBD has situations where it can support FUA but not ZERO_WRITE; when that happens, the generic block layer fallback to bdrv_driver_pwritev() (or the older bdrv_co_writev() in qemu 2.6) was losing the FUA flag. The problem of losing flags unrelated to ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since commit aa7bfbff, but back then, it did not matter because there was no FUA flag. It became observable when commit 93f5e6d8 paved the way for flags that can impact correctness, when we should have been using bdrv_co_writev_flags() with modified flags. Compare to commit 9eeb6dd, which got flag manipulation right in bdrv_co_do_zero_pwritev(). Symptoms: I tested with qemu-io with default writethrough cache (which is supposed to use FUA semantics on every write), and targetted an NBD client connected to a server that intentionally did not advertise NBD_FLAG_SEND_FUA. When doing 'write 0 512', the NBD client sent two operations (NBD_CMD_WRITE then NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing 'write -z 0 512', the NBD client sent only NBD_CMD_WRITE. The fix is do to a cleanup bdrv_co_flush() at the end of the operation if any step in the middle relied on a BDS that does not natively support FUA for that step (note that we don't need to flush after every operation, if the operation is broken into chunks based on bounce-buffer sizing). Each BDS gains a new flag .supported_zero_flags, which parallels the use of .supported_write_flags but only when accessing a zero write operation (the flags MUST be different, because of SCSI having different semantics based on WRITE vs. WRITESAME; and also because BDRV_REQ_MAY_UNMAP only makes sense on zero writes). Also fix some documentation to describe -ENOTSUP semantics, particularly since iscsi depends on those semantics. Down the road, we may want to add a driver where its .bdrv_co_pwritev() honors all three of BDRV_REQ_FUA, BDRV_REQ_ZERO_WRITE, and BDRV_REQ_MAY_UNMAP, and advertise this via bs->supported_write_flags for blocks opened by that driver; such a driver should NOT supply .bdrv_co_write_zeroes nor .supported_zero_flags. But none of the drivers touched in this patch want to do that (the act of writing zeroes is different enough from normal writes to deserve a second callback). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12block: Make supported_write_flags a per-bds propertyEric Blake
Pre-patch, .supported_write_flags lives at the driver level, which means we are blindly declaring that all block devices using a given driver will either equally support FUA, or that we need a fallback at the block layer. But there are drivers where FUA support is a per-block decision: the NBD block driver is dependent on the remote server advertising NBD_FLAG_SEND_FUA (and has fallback code to duplicate the flush that the block layer would do if NBD had not set .supported_write_flags); and the iscsi block driver is dependent on the mode sense bits advertised by the underlying device (and is currently silently ignoring FUA requests if the underlying device does not support FUA). The fix is to make supported flags as a per-BDS option, set during .bdrv_open(). This patch moves the variable and fixes NBD and iscsi to set it only conditionally; later patches will then further simplify the NBD driver to quit duplicating work done at the block layer, as well as tackle the fact that SCSI does not support FUA semantics on WRITESAME(10/16) but only on WRITE(10/16). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12Allow users to specify the vmdk virtual hardware version.Janne Karhunen
Vmdk images have metadata to indicate the vmware virtual hardware version image was created/tested to run with. Allow users to specify that version via new 'hwversion' option. [ kwolf: Adjust qemu-iotests common.filter ] Signed-off-by: Janne Karhunen <Janne.Karhunen@gmail.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>