aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2012-05-14qcow2: Don't ignore failure to clear autoclear flagsKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10block: fix warning introduced in efcc7a23Anthony Liguori
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-10stream: do not copy unallocated sectors from the basePaolo Bonzini
Unallocated sectors should really never be accessed by the guest, so there's no need to copy them during the streaming process. If they are read by the guest during streaming, guest-initiated copy-on-read will copy them (we're in the base == NULL case, which enables copy on read). If they are read after we disconnect the image from the base, they will read as zeroes anyway. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10stream: fix ratelimiting corner casePaolo Bonzini
This fixes inability to make progress in streaming if the quota is set to less than the amount of data that an I/O operation has to write. In this case, limit->dispatched + n will always be above the quota and, due to the "goto retry" to recheck cancellation and allocation, streaming will livelock. This can be reproduced with "block_job_set_speed ide0-hd0 1b". Of course, with this patch the requested limit will not be obeyed. That could be done with another patch that caps is_allocated's n argument by the slice quota. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10stream: pass new base image format to bdrv_change_backing_filePaolo Bonzini
When an image is modified to point to the new backing file, the backing file format is set to NULL, which means auto-probe. This is wrong, in fact it is a small security problem. Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10block: wait for job callback in block_job_cancel_syncPaolo Bonzini
The limitation on not having I/O after cancellation cannot really be kept. Even streaming has a very small race window where you could cancel a job and have it report completion. If this window is hit, bdrv_change_backing_file() will yield and possibly cause accesses to dangling pointers etc. So, let's just assume that we cannot know exactly what will happen after the coroutine has set busy to false. We can set a very lax condition: - if we cancel the job, the coroutine won't set it to false again (and hence will not call co_sleep_ns again). - block_job_cancel_sync will wait for the coroutine to exit, which pretty much ensures no race. Instead, we track the coroutine that executes the job and put very strict conditions on what to do while it is quiescent (busy = false). First of all, the coroutine must never set busy = false while the job has been cancelled. Second, the coroutine can be reentered arbitrarily while it is quiescent, so you cannot really do anything but co_sleep_ns at that time. This condition is obeyed by the block_job_sleep_ns function. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10block: add block_job_sleep_nsPaolo Bonzini
This function abstracts the pretty complex semantics of the "busy" member of BlockJob. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10block: fix snapshot on QEDPaolo Bonzini
QED's opaque data includes a pointer back to the BlockDriverState. This breaks when bdrv_append shuffles data between bs_new and bs_top. To avoid this, add a "rebind" function that tells the driver about the new relationship between the BlockDriverState and its opaque. The patch also adds rebind to VVFAT for completeness, even though it is not used with live snapshots. Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10block: update in-memory backing file and formatPaolo Bonzini
These are needed to print "info block" output correctly. QCOW2 does this because it needs it to write the header, but QED does not, and common code is the right place to do it. Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10block: push bdrv_change_backing_file error checking up from driversPaolo Bonzini
This check applies to all drivers, but QED lacks it. Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-08Merge remote-tracking branch 'kwolf/for-anthony' into stagingAnthony Liguori
* kwolf/for-anthony: fdc: simplify media change handling qcow2: lock on prealloc block: make bdrv_create adopt coroutine qcow2: Limit COW to where it's needed sheepdog: switch to writethrough mode if cluster doesn't support flush
2012-05-07qcow2: lock on preallocZhi Yong Wu
preallocate() will be locked. This is required because qcow2_alloc_cluster_link_l2() assumes that it runs under a lock that it can drop while COW is being performed. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-07qcow2: Limit COW to where it's neededKevin Wolf
This fixes a regression introduced in commit 250196f1. The bug leads to data corruption, found during an Autotest run with a Fedora 8 guest. Consider a write request whose first part is covered by an already allocated cluster, but additional clusters need to be newly allocated. When counting the number of clusters to allocate, the qcow2 code would decide to do COW for all remaining clusters of the write request, even if some of them are already allocated. If during this COW operation another write request is issued that touches the same cluster, it will still refer to the old cluster. When the COW completes, the first request will update the L2 table and the second write request will be lost. Note that the requests need not overlap, it's enough for them to touch the same cluster. This patch ensures that only clusters that really require COW are considered for allocation. In this case any other request writing to the same cluster will be an allocating write and gets serialised. Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Tested-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-07sheepdog: switch to writethrough mode if cluster doesn't support flushMORITA Kazutaka
This is necessary for qemu to work with the older version of Sheepdog which doesn't support SD_OP_FLUSH_VDI. Signed-off-by: MORITA Kazutaka <morita.kazutaka@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-04ISCSI: Add support for thin-provisioning via discard/UNMAP and bigger LUNsRonnie Sahlberg
Update the configure test for libiscsi support to detect version 1.3 or later. Version 1.3 of libiscsi provides both READCAPACITY16 as well as UNMAP commands. Update the iscsi block layer to use READCAPACITY16 to detect the size of the LUN instead of READCAPACITY10. This allows support for LUNs larger than 2TB. Update to implement bdrv_aio_discard() using the UNMAP command. This allows us to use thin-provisioned LUNs from TGTD and other iSCSI targets that support thin-provisioning. Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> [squashed in subsequent patch from Ronnie to fix off-by-one in LBA count] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-05-02rbd: add discard supportJosh Durgin
Change the write flag to an operation type in RBDAIOCB, and make the buffer optional since discard doesn't use it. Discard is first included in librbd 0.1.2 (which is in Ceph 0.46). If librbd is too old, leave out qemu_rbd_aio_discard entirely, so the old behavior is preserved. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-02qcow2: fix the return value -ENOENT -> -EEXISTZhi Yong Wu
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-02qcow2: Don't hold cache references across yieldKevin Wolf
If cache references are held while the coroutine has yielded, the cache may get used up and abort() when it can't find a free entry. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-02qcow2: Remove unused parameter in do_alloc_cluster_offsetKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-02block/qcow2: Add missing GCC_FMT_ATTR to function report_unsupported()Stefan Weil
Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-01raw-posix: Do not use CONFIG_COCOA macroPavel Borzenkov
Use __APPLE__ and __MACH__ macros instead of CONFIG_COCOA to detect Mac OS X host. The patch is based on Ben Leslie's patch: http://patchwork.ozlabs.org/patch/97859/ Signed-off-by: Ben Leslie <benno@benno.id.au> Signed-off-by: Pavel Borzenkov <pavel.borzenkov@gmail.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Andreas Färber <andreas.faerber@web.de>
2012-04-27Merge remote-tracking branch 'qmp/queue/qmp' into stagingAnthony Liguori
* qmp/queue/qmp: qapi: fix qmp_balloon() conversion qemu-iotests: add block-stream speed value test case block: add 'speed' optional parameter to block-stream block: change block-job-set-speed argument from 'value' to 'speed' block: use Error mechanism instead of -errno for block_job_set_speed() block: use Error mechanism instead of -errno for block_job_create()
2012-04-27block: add 'speed' optional parameter to block-streamStefan Hajnoczi
Allow streaming operations to be started with an initial speed limit. This eliminates the window of time between starting streaming and issuing block-job-set-speed. Users should use the new optional 'speed' parameter instead so that speed limits are in effect immediately when the job starts. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27block: change block-job-set-speed argument from 'value' to 'speed'Stefan Hajnoczi
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27block: use Error mechanism instead of -errno for block_job_set_speed()Stefan Hajnoczi
There are at least two different errors that can occur in block_job_set_speed(): the job might not support setting speeds or the value might be invalid. Use the Error mechanism to report the error where it occurs. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27block: use Error mechanism instead of -errno for block_job_create()Stefan Hajnoczi
The block job API uses -errno return values internally and we convert these to Error in the QMP functions. This is ugly because the Error should be created at the point where we still have all the relevant information. More importantly, it is hard to add new error cases to this case since we quickly run out of -errno values without losing information. Go ahead and use Error directly and don't convert later. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-26nbd: Fix uninitialised use of s->sockKevin Wolf
s->sock is assigned only afterwards, so we're really registering an aio_fd_handler for file descriptor 0 here. Not exactly what we intended. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-23Merge remote-tracking branch 'kwolf/for-anthony' into stagingAnthony Liguori
* kwolf/for-anthony: (38 commits) qemu-iotests: Fix test 031 for qcow2 v3 support qemu-iotests: Add -o and make v3 the default for qcow2 qcow2: Zero write support qemu-iotests: Test backing file COW with zero clusters qemu-iotests: add a simple test for write_zeroes qcow2: Support for feature table header extension qcow2: Support reading zero clusters qcow2: Version 3 images qcow2: Ignore reserved bits in check_refcounts qcow2: Ignore reserved bits in refcount table entries qcow2: Simplify count_cow_clusters qcow2: Refactor qcow2_free_any_clusters qcow2: Ignore reserved bits in L1/L2 entries qcow2: Fail write_compressed when overwriting data qcow2: Ignore reserved bits in count_contiguous_clusters() qcow2: Ignore reserved bits in get_cluster_offset qcow2: Save disk size in snapshot header Specification for qcow2 version 3 qcow2: Fix refcount block allocation during qcow2_alloc_cluster_at() iotests: Resolve test failures caused by hostname ...
2012-04-20qcow2: Zero write supportKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20qcow2: Support for feature table header extensionKevin Wolf
Instead of printing an ugly bitmask, qemu can now print a more helpful string even for yet unknown features. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20qcow2: Support reading zero clustersKevin Wolf
This adds support for reading zero clusters in version 3 images. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20qcow2: Version 3 imagesKevin Wolf
This adds the basic infrastructure to qcow2 to handle version 3 images. It includes code to create v3 images, allow header updates for v3 images and checks feature bits. It still misses support for zero clusters, so this is not a fully compliant implementation of v3 yet. The default for creating new images stays at v2 for now. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20qcow2: Ignore reserved bits in check_refcountsKevin Wolf
Also don't infer the cluster type directly from the L2 entries, but use qcow2_get_cluster_type() to keep everything in a single place. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20qcow2: Ignore reserved bits in refcount table entriesKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20qcow2: Simplify count_cow_clustersKevin Wolf
count_cow_clusters() tries to reuse existing functions, and all it achieves is to make things much more complicated than they really are: Everything needs COW, unless it's a normal cluster with refcount 1. This patch implements the obvious way of doing this, and by using qcow2_get_cluster_type() it gets rid of all flag magic. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20qcow2: Refactor qcow2_free_any_clustersKevin Wolf
Zero clusters will add another cluster type. Refactor the open-coded cluster type detection into a switch of QCOW2_CLUSTER_* options so that the detection is in a single place. This makes it easier to add new cluster types. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20qcow2: Ignore reserved bits in L1/L2 entriesKevin Wolf
This changes the still existing places that assume that the only flags are QCOW_OFLAG_COPIED and QCOW_OFLAG_COMPRESSED to properly mask out reserved bits. It does not convert bdrv_check yet. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20qcow2: Fail write_compressed when overwriting dataKevin Wolf
qcow2_alloc_compressed_cluster_offset() already fails if the copied flag is set, because qcow2_write_compressed() doesn't perform COW as it would have to do to allow this. However, what we really want to check here is whether the cluster is allocated or not. With internal snapshots the copied flag may not be set on allocated clusters. Check the cluster offset instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20qcow2: Ignore reserved bits in count_contiguous_clusters()Kevin Wolf
Until now, count_contiguous_clusters() has an argument that allowed to specify flags that should be ignored in the comparison, i.e. that are allowed to change between contiguous clusters. This patch changes the function so that it ignores all flags by default now and you need to pass the flags on which it should stop. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20qcow2: Ignore reserved bits in get_cluster_offsetKevin Wolf
With this change, reading from a qcow2 image ignores all reserved bits that are set in an L1 or L2 table entry. Now get_cluster_offset() assigns *cluster_offset only the offset without any other flags. The cluster type is not longer encoded in the offset, but a positive return value in case of success. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20qcow2: Save disk size in snapshot headerKevin Wolf
This allows that different snapshots of an image can have different sizes, which is a requirement for enabling image resizing even with images that have internal snapshots. We don't do the actual support for it now, but make sure that the additional field is present and not completely ignored in all version 3 images. When trying to load a snapshot of different size, it returns an error. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20qcow2: Fix refcount block allocation during qcow2_alloc_cluster_at()Kevin Wolf
Refcount block allocation and refcount table growth rely on s->free_cluster_index pointing to somewhere after the current allocation. Change qcow2_alloc_cluster_at() to fulfill this assumption. Without this change it could happen that a newly allocated refcount block and the allocated data block point to the same area in the image file, causing data corruption in the long run. This fixes a bug that became first visible after commit 250196f1. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19aio: remove process_queue callback and qemu_aio_process_queuePaolo Bonzini
Both unused after the previous patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19nbd: do not block in nbd_wr_sync if no data at all is availablePaolo Bonzini
Right now, nbd_wr_sync will hang if no data at all is available on the socket and the other side is not going to provide any. Relax this by making it loop only for writes or partial reads. This fixes a race where one thread is executing qemu_aio_wait() and another is executing main_loop_wait(). Then, the select() call in main_loop_wait() can return stale data and call the "readable" callback with no data in the socket. Reported-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19nbd: consistently return negative errno valuesPaolo Bonzini
In the next patch we need to look at the return code of nbd_wr_sync. To avoid percolating the socket_error() ugliness all around, let's handle errors by returning negative errno values. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19nbd: consistently check for <0 or >=0Paolo Bonzini
This prepares for the following patch, which changes -1 return values to negative errno. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19nbd: avoid out of bounds access to recv_coroutine arrayPaolo Bonzini
This can happen with a buggy or malicious server. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19qcow2: Fix return value of alloc_refcount_blockKevin Wolf
Someone forgot something in commit 29c1a730... Documenting the right return value is not enough, you also need to actually return it in the code. This bug sometimes causes error return values even when everything has succeeded: The new offset of the refcount block is truncated to 32 bits and interpreted as signed. At least with small cluster sizes it's easy to get a negative return value this way. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19qcow2: Fix error handling in qcow2_alloc_cluster_offsetKevin Wolf
If do_alloc_cluster_offset() fails, the error handling code tried to remove the request from the in-flight queue, to which it wasn't added yet, resulting in a NULL pointer dereference. m->nb_clusters really only becomes != 0 when the request is in the list. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-19block: Fix spelling in comment (ineffcient -> inefficient)Stefan Weil
Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>