aboutsummaryrefslogtreecommitdiff
path: root/include/block/block_int.h
AgeCommit message (Collapse)Author
2014-12-10block: Make essential BlockDriver objects publicMax Reitz
There are some block drivers which are essential to QEMU and may not be removed: These are raw, file and qcow2 (as the default non-raw format). Make their BlockDriver objects public so they can be directly referenced throughout the block layer without needing to call bdrv_find_format() and having to deal with an error at runtime, while the real problem occurred during linking (where raw, file or qcow2 were not linked into qemu). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10raw: Prohibit dangerous writes for probed imagesKevin Wolf
If the user neglects to specify the image format, QEMU probes the image to guess it automatically, for convenience. Relying on format probing is insecure for raw images (CVE-2008-2004). If the guest writes a suitable header to the device, the next probe will recognize a format chosen by the guest. A malicious guest can abuse this to gain access to host files, e.g. by crafting a QCOW2 header with backing file /etc/shadow. Commit 1e72d3b (April 2008) provided -drive parameter format to let users disable probing. Commit f965509 (March 2009) extended QCOW2 to optionally store the backing file format, to let users disable backing file probing. QED has had a flag to suppress probing since the beginning (2010), set whenever a raw backing file is assigned. All of these additions that allow to avoid format probing have to be specified explicitly. The default still allows the attack. In order to fix this, commit 79368c8 (July 2010) put probed raw images in a restricted mode, in which they wouldn't be able to overwrite the first few bytes of the image so that they would identify as a different image. If a write to the first sector would write one of the signatures of another driver, qemu would instead zero out the first four bytes. This patch was later reverted in commit 8b33d9e (September 2010) because it didn't get the handling of unaligned qiov members right. Today's block layer that is based on coroutines and has qiov utility functions makes it much easier to get this functionality right, so this patch implements it. The other differences of this patch to the old one are that it doesn't silently write something different than the guest requested by zeroing out some bytes (it fails the request instead) and that it doesn't maintain a list of signatures in the raw driver (it calls the usual probe function instead). Note that this change doesn't introduce new breakage for false positive cases where the guest legitimately writes data into the first sector that matches the signatures of an image format (e.g. for nested virt): These cases were broken before, only the failure mode changes from corruption after the next restart (when the wrong format is probed) to failing the problematic write request. Also note that like in the original patch, the restrictions only apply if the image format has been guessed by probing. Explicitly specifying a format allows guests to write anything they like. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1416497234-29880-8-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10block: Read only one sector for format probingKevin Wolf
The only image format driver that even potentially accesses anything after 512 bytes in its bdrv_probe() implementation is VMDK, which reads a plain-text descriptor file. In practice, the field it's looking for seems to come first and will be well within the first 512 bytes, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1416497234-29880-7-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-11-03block: Add status callback to bdrv_amend_options()Max Reitz
Depending on the changed options and the image format, bdrv_amend_options() may take a significant amount of time. In these cases, a way to be informed about the operation's status is desirable. Since the operation is rather complex and may fundamentally change the image, implementing it as AIO or a coroutine does not seem feasible. On the other hand, implementing it as a block job would be significantly more difficult than a simple callback and would not add benefits other than progress report to the amending operation, because it should not actually be run as a block job at all. A callback may not be very pretty, but it's very easy to implement and perfectly fits its purpose here. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1414404776-4919-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03BlockLimits: introduce max_transfer_lengthPeter Lieven
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-10-20block: Lift device model API into BlockBackendMarkus Armbruster
Move device model attachment / detachment and the BlockDevOps device model callbacks and their wrappers from BlockDriverState to BlockBackend. Wrapper calls in block.c change from bdrv_dev_FOO_cb(bs, ...) to if (bs->blk) { bdrv_dev_FOO_cb(bs->blk, ...); } No change, because both bdrv_dev_change_media_cb() and bdrv_dev_resize_cb() do nothing when no device model is attached, and a device model can be attached only when bs->blk. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20block: Rename BlockDriverCompletionFunc to BlockCompletionFuncMarkus Armbruster
I'll use it with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20block: Rename BlockDriverAIOCB* to BlockAIOCB*Markus Armbruster
I'll use BlockDriverAIOCB with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20block: Eliminate BlockDriverState member device_name[]Markus Armbruster
device_name[] can become non-empty only in bdrv_new_root() and bdrv_move_feature_fields(). The latter is used only to undo damage done by bdrv_swap(). The former is called only by blk_new_with_bs(). Therefore, when a BlockDriverState's device_name[] is non-empty, then it's been created with a BlockBackend, and vice versa. Furthermore, blk_new_with_bs() keeps the two names equal. Therefore, device_name[] is redundant. Eliminate it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20block: Connect BlockBackend to BlockDriverStateMarkus Armbruster
Convenience function blk_new_with_bs() creates a BlockBackend with its BlockDriverState. Callers have to unref both. The commit after next will relieve them of the need to unref the BlockDriverState. Complication: due to the silly way drive_del works, we need a way to hide a BlockBackend, just like bdrv_make_anon(). To emphasize its "special" status, give the function a suitably off-putting name: blk_hide_on_behalf_of_do_drive_del(). Unfortunately, hiding turns the BlockBackend's name into the empty string. Can't avoid that without breaking the blk->bs->device_name equals blk->name invariant. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-10block: Extract the block accounting codeBenoît Canet
The plan is to add new accounting metrics (latency, invalid requests, failed requests, queue depth) and block.c is overpopulated so it will be better to work in a separate module. Moreover the long term plan is to have statistics in each of the BDS of the graph for metrology purpose; this means that the device model statistics must move from the topmost BDS to the device model. So we need to decouple the statistic code from BlockDriverState. This is another argument for the extraction of the code in a separate module. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Benoit Canet <benoit@irqsave.net> CC: Fam Zheng <famz@redhat.com> CC: Peter Crosthwaite <peter.crosthwaite@xilinx.com> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-10block: Extract the BlockAcctStats structureBenoît Canet
Extract the block accounting statistics into a structure so the block device models can hold them in the future. CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-29block: Add AIO context notifiersMax Reitz
If a long-running operation on a BDS wants to always remain in the same AIO context, it somehow needs to keep track of the BDS changing its context. This adds a function for registering callbacks on a BDS which are called whenever the BDS is attached or detached from an AIO context. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-20block: Add bdrv_refresh_filename()Max Reitz
Some block devices may not have a filename in their BDS; and for some, there may not even be a normal filename at all. To work around this, add a function which tries to construct a valid filename for the BDS.filename field. If a filename exists or a block driver is able to reconstruct a valid filename (which is placed in BDS.exact_filename), this can directly be used. If no filename can be constructed, we can still construct an options QDict which is then converted to a JSON object and prefixed with the "json:" pseudo protocol prefix. The QDict is placed in BDS.full_open_options. For most block drivers, this process can be done automatically; those that need special handling may define a .bdrv_refresh_filename() method to fill BDS.exact_filename and BDS.full_open_options themselves. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-07-18block: Add Error argument to bdrv_refresh_limits()Kevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-07block: block: introduce APIs for submitting IO as a batchMing Lei
This patch introduces three APIs so that following patches can support queuing I/O requests and submitting them as a batch for improving I/O performance. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-01block: extend block-commit to accept a string for the backing fileJeff Cody
On some image chains, QEMU may not always be able to resolve the filenames properly, when updating the backing file of an image after a block commit. For instance, certain relative pathnames may fail, or drives may have been specified originally by file descriptor (e.g. /dev/fd/???), or a relative protocol pathname may have been used. In these instances, QEMU may lack the information to be able to make the correct choice, but the user or management layer most likely does have that knowledge. With this extension to the block-commit api, the user is able to change the backing file of the overlay image as part of the block-commit operation. This allows the change to be 'safe', in the sense that if the attempt to write the overlay image metadata fails, then the block-commit operation returns failure, without disrupting the guest. If the commit top is the active layer, then specifying the backing file string will be treated as an error (there is no overlay image to modify in that case). If a backing file string is not specified in the command, the backing file string to use is determined in the same manner as it was previously. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-07-01qemu-img create: add 'nocow' optionChunyan Liu
Add 'nocow' option so that users could have a chance to set NOCOW flag to newly created files. It's useful on btrfs file system to enhance performance. Btrfs has low performance when hosting VM images, even more when the guest in those VM are also using btrfs as file system. One way to mitigate this bad performance is to turn off COW attributes on VM files. Generally, there are two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then all newly created files will be NOCOW. b) per file. Add the NOCOW file attribute. It could only be done to empty or new files. This patch tries the second way, according to the option, it could add NOCOW per file. For most block drivers, since the create file step is in raw-posix.c, so we can do setting NOCOW flag ioctl in raw-posix.c only. But there are some exceptions, like block/vpc.c and block/vdi.c, they are creating file by calling qemu_open directly. For them, do the same setting NOCOW flag ioctl work in them separately. [Fixed up 082.out due to the new 'nocow' creation option --Stefan] Signed-off-by: Chunyan Liu <cyliu@suse.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-27block: Add replaces argument to drive-mirrorBenoît Canet
drive-mirror will bdrv_swap the new BDS named node-name with the one pointed by replaces when the mirroring is finished. Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-26block: Catch backing files assigned to non-COW driversKevin Wolf
Since we parse backing.* options to add a backing file from the command line when the driver didn't assign one, it has been possible to have a backing file for e.g. raw images (it just was never accessed). This is obvious nonsense and should be rejected. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-23qapi event: convert BLOCK_IO_ERROR and BLOCK_JOB_ERRORWenchao Xia
Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-16cleanup QEMUOptionParameterChunyan Liu
Now that all backend drivers are using QemuOpts, remove all QEMUOptionParameter related codes. Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Signed-off-by: Chunyan Liu <cyliu@suse.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16change block layer to support both QemuOpts and QEMUOptionParamterChunyan Liu
Change block layer to support both QemuOpts and QEMUOptionParameter. After this patch, it will change backend drivers one by one. At the end, QEMUOptionParameter will be removed and only QemuOpts is kept. Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Signed-off-by: Chunyan Liu <cyliu@suse.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-04block: Move declaration of bdrv_get_aio_context to block.hFam Zheng
block_int.h is for block layer and block drivers, other code shouldn't include it. But similar to bdrv_set_aio_context, bdrv_get_aio_context should also be accessible from outside of block layer. Move it. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-04block: add bdrv_set_aio_context()Stefan Hajnoczi
Up until now all BlockDriverState instances have used the QEMU main loop for fd handlers, timers, and BHs. This is not scalable on SMP guests and hosts so we need to move to a model with multiple event loops on different host CPUs. bdrv_set_aio_context() assigns the AioContext event loop to use for a particular BlockDriverState. It first detaches the entire BlockDriverState graph from the current AioContext and then attaches to the new AioContext. This function will be used by virtio-blk data-plane to assign a BlockDriverState to its IOThread AioContext. Make bdrv_aio_set_context() public since data-plane should not include block_int.h. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28block: Add backing_blocker in BlockDriverStateFam Zheng
This makes use of op_blocker and blocks all the operations except for commit target, on each BlockDriverState->backing_hd. The asserts for op_blocker in bdrv_swap are removed because with this change, the target of block commit has at least the backing blocker of its child, so the assertion is not true. Callers should do their check. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28block: Replace in_use with operation blockerFam Zheng
This drops BlockDriverState.in_use with op_blockers: - Call bdrv_op_block_all in place of bdrv_set_in_use(bs, 1). - Call bdrv_op_unblock_all in place of bdrv_set_in_use(bs, 0). - Check bdrv_op_is_blocked() in place of bdrv_in_use(bs). The specific types are used, e.g. in place of starting block backup, bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP, ...). There is one exception in block_job_create, where bdrv_op_blocker_is_empty() is used, because we don't know the operation type here. This doesn't matter because in a few commits away we will drop the check and move it to callers that _do_ know the type. - Check bdrv_op_blocker_is_empty() in place of assert(!bs->in_use). Note: there is only bdrv_op_block_all and bdrv_op_unblock_all callers at this moment. So although the checks are specific to op types, this changes can still be seen as identical logic with previously with in_use. The difference is error message are improved because of blocker error info. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28block: Introduce op_blockers to BlockDriverStateFam Zheng
BlockDriverState.op_blockers is an array of lists with BLOCK_OP_TYPE_MAX elements. Each list is a list of blockers of an operation type (BlockOpType), that marks this BDS as currently blocked for a certain type of operation with reason errors stored in the list. The rule of usage is: * BDS user who wants to take an operation should check if there's any blocker of the type with bdrv_op_is_blocked(). * BDS user who wants to block certain types of operation, should call bdrv_op_block (or bdrv_op_block_all to block all types of operations, which is similar to the existing bdrv_set_in_use()). * A blocker is only referenced by op_blockers, so the lifecycle is managed by caller, and shouldn't be lost until unblock, so typically a caller does these: - Allocate a blocker with error_setg or similar, call bdrv_op_block() to block some operations. - Hold the blocker, do his job. - Unblock operations that it blocked, with the same reason pointer passed to bdrv_op_unblock(). - Release the blocker with error_free(). Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-19block: optimize zero writes with bdrv_write_zeroesPeter Lieven
this patch tries to optimize zero write requests by automatically using bdrv_write_zeroes if it is supported by the format. This significantly speeds up file system initialization and should speed zero write test used to test backend storage performance. I ran the following 2 tests on my internal SSD with a 50G QCOW2 container and on an attached iSCSI storage. a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX QCOW2 [off] [on] [unmap] ----- runtime: 14secs 1.1secs 1.1secs filesize: 937M 18M 18M iSCSI [off] [on] [unmap] ---- runtime: 9.3s 0.9s 0.9s b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct QCOW2 [off] [on] [unmap] ----- runtime: 246secs 18secs 18secs filesize: 51G 192K 192K throughput: 203M/s 2.3G/s 2.3G/s iSCSI* [off] [on] [unmap] ---- runtime: 8mins 45secs 33secs throughput: 106M/s 1.2G/s 1.6G/s allocated: 100% 100% 0% * The storage was connected via an 1Gbit interface. It seems to internally handle writing zeroes via WRITESAME16 very fast. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-04-30block: Unlink temporary files in raw-posix/win32Kevin Wolf
Instead of having unlink() calls in the generic block layer, where we aren't even guarateed to have a file name, move them to those block drivers that are actually used and that always have a filename. Gets us rid of some #ifdefs as well. The patch also converts bs->is_temporary to a new BDRV_O_TEMPORARY open flag so that it is inherited in the protocol layer and the raw-posix and raw-win32 drivers can unlink the file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-03-19block: Add error handling to bdrv_invalidate_cache()Kevin Wolf
If it returns an error, the migrated VM will not be started, but qemu exits with an error message. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-03-13block: Rewrite the snapshot authorization mechanism for block filters.Benoît Canet
This patch keep the recursive way of doing things but simplify it by giving two responsabilities to all block filters implementors. They will need to do two things: -Set the is_filter field of their block driver to true. -Implement the bdrv_recurse_is_first_non_filter method of their block driver like it is done on the Quorum block driver. (block/quorum.c) [Paolo Bonzini <pbonzini@redhat.com> pointed out that this patch changes the semantics of blkverify, which now recurses down both bs->file and s->test_file. -- Stefan] Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-01-24block: Allow wait_serialising_requests() at any pointKevin Wolf
We can only have a single wait_serialising_requests() call per request because otherwise we can run into deadlocks where requests are waiting for each other. The same is true when wait_serialising_requests() is not at the very beginning of a request, so that other requests can be issued between the start of the tracking and wait_serialising_requests(). Fix this by changing wait_serialising_requests() to ignore requests that are already (directly or indirectly) waiting for the calling request. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24block: Make overlap range for serialisation dynamicKevin Wolf
Copy on Read wants to serialise with all requests touching the same cluster, so wait_serialising_requests() rounded to cluster boundaries. Other users like alignment RMW will have different requirements, though (requests touching the same sector), so make it dynamic. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24block: Generalise and optimise COR serialisationKevin Wolf
Change the API so that specific requests can be marked serialising. Only these requests are checked for overlaps then. This means that during a Copy on Read operation, not all requests overlapping other requests are serialised any more, but only those that actually overlap with the specific COR request. Also remove COR from function and variable names because this functionality can be useful in other contexts. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24block: Switch BdrvTrackedRequest to byte granularityKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24raw: Probe required direct I/O alignmentPaolo Bonzini
Add a bs->request_alignment field that contains the required offset/length alignment for I/O requests and fill it in the raw block drivers. Use ioctls if possible, else see what alignment it takes for O_DIRECT to succeed. While at it, also expose the memory alignment requirements, which may be (and in practice are) different from the disk alignment requirements. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24block: rename buffer_alignment to guest_block_sizePaolo Bonzini
The alignment field is now set to the value that is promised to the guest, rather than required by the host. The next patches will make QEMU aware of the host-provided values, so make this clear. The alignment is also not about memory buffers, but about the sectors on the disk, change the documentation of the field. At this point, the field is set by the device emulation, but completely ignored by the block layer. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24block: Don't use guest sector size for qemu_blockalign()Kevin Wolf
bs->buffer_alignment is set by the device emulation and contains the logical block size of the guest device. This isn't something that the block layer should know, and even less something to use for determining the right alignment of buffers to be used for the host. The new BlockLimits field opt_mem_alignment tells the qemu block layer the optimal alignment to be used so that no bounce buffer must be used in the driver. This patch may change the buffer alignment from 4k to 512 for all callers that used qemu_blockalign() with the top-level image format BlockDriverState. The value was never propagated to other levels in the tree, so in particular raw-posix never required anything else than 512. While on disks with 4k sectors direct I/O requires a 4k alignment, memory may still be okay when aligned to 512 byte boundaries. This is what must have happened in practice, because otherwise this would already have failed earlier. Therefore I don't expect regressions even with this intermediate state. Later, raw-posix can implement the hook and expose a different memory alignment requirement. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24block: Move initialisation of BlockLimits to bdrv_refresh_limits()Kevin Wolf
This function separates filling the BlockLimits from bdrv_open(), which allows it to call it from other operations which may change the limits (e.g. modifications to the backing file chain or bdrv_reopen) Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24block: Create authorizations mechanism for external snapshot and resize.Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24block: Add bs->node_name to hold the name of a bs node of the bs graph.Benoît Canet
Add the minimum of code to prepare for the following patches. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-12-20block: Add commit_active_start()Fam Zheng
commit_active_start is implemented in block/mirror.c, It will create a job with "commit" type and designated base in block-commit command. This will be used for committing active layer of device. Sync mode is removed from MirrorBlockJob because there's no proper type for commit. The used information is is_none_mode. The common part of mirror_start and commit_active_start is moved to mirror_start_job(). Fix the comment wording for commit_start. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-05block: add opt_transfer_length to BlockLimitsPeter Lieven
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-04snapshot: distinguish id and name in load_tmpWenchao Xia
Since later this function will be used so improve it. The only caller of it now is qemu-img, and it is not impacted by introduce function bdrv_snapshot_load_tmp_by_id_or_name() that call bdrv_snapshot_load_tmp() twice to keep old search logic. bdrv_snapshot_load_tmp_by_id_or_name() return int to let caller know the errno, and errno will be used later. Also fix a typo in comments of bdrv_snapshot_delete(). Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-29blkdebug: add "remove_break" commandFam Zheng
This adds "remove_break" command which is the reverse of blkdebug command "break": it removes all breakpoints with given tag and resumes all the requests. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-29sheepdog: support user-defined redundancy optionLiu Yuan
Sheepdog support two kinds of redundancy, full replication and erasure coding. # create a fully replicated vdi with x copies -o redundancy=x (1 <= x <= SD_MAX_COPIES) # create a erasure coded vdi with x data strips and y parity strips -o redundancy=x:y (x must be one of {2,4,8,16} and 1 <= y < SD_EC_MAX_STRIP) E.g, to convert a vdi into sheepdog vdi 'test' with 8:3 erasure coding scheme $ qemu-img convert -o redundancy=8:3 linux-0.2.img sheepdog:test Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-29block: per caller dirty bitmapFam Zheng
Previously a BlockDriverState has only one dirty bitmap, so only one caller (e.g. a block job) can keep track of writing. This changes the dirty bitmap to a list and creates a BdrvDirtyBitmap for each caller, the lifecycle is managed with these new functions: bdrv_create_dirty_bitmap bdrv_release_dirty_bitmap Where BdrvDirtyBitmap is a linked list wrapper structure of HBitmap. In place of bdrv_set_dirty_tracking, a BdrvDirtyBitmap pointer argument is added to these functions, since each caller has its own dirty bitmap: bdrv_get_dirty bdrv_dirty_iter_init bdrv_get_dirty_count bdrv_set_dirty and bdrv_reset_dirty prototypes are unchanged but will internally walk the list of all dirty bitmaps and set them one by one. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-11-28block: add BlockLimits structure to BlockDriverStatePeter Lieven
this patch adds BlockLimits which introduces discard and write_zeroes limits and alignment information to the BlockDriverState. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28block: add flags to bdrv_*_write_zeroesPeter Lieven
Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>