aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2017-07-04nbd: fix NBD over TLSPaolo Bonzini
When attaching the NBD QIOChannel to an AioContext, the TLS channel should be used, not the underlying socket channel. This is because, trivially, the TLS channel will be the one that we read/write to and thus the one that will get the qio_channel_yield() call. Fixes: ff82911cd3f69f028f2537825c9720ff78bc3f19 Cc: qemu-stable@nongnu.org Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Tested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-30block: Exploit BDRV_BLOCK_EOF for larger zero blocksEric Blake
When we have a BDS with unallocated clusters, but asking the status of its underlying bs->file or backing layer encounters an end-of-file condition, we know that the rest of the unallocated area will read as zeroes. However, pre-patch, this required two separate calls to bdrv_get_block_status(), as the first call stops at the point where the underlying file ends. Thanks to BDRV_BLOCK_EOF, we can now widen the results of the primary status if the secondary status already includes BDRV_BLOCK_ZERO. In turn, this fixes a TODO mentioned in iotest 154, where we can now see that all sectors in a partial cluster at the end of a file read as zero when coupling the shorter backing file's status along with our knowledge that the remaining sectors came from an unallocated cluster. Also, note that the loop in bdrv_co_get_block_status_above() had an inefficent exit: in cases where the active layer sets BDRV_BLOCK_ZERO but does NOT set BDRV_BLOCK_ALLOCATED (namely, where we know we read zeroes merely because our unallocated clusters lie beyond the backing file's shorter length), we still ended up probing the backing layer even though we already had a good answer. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170505021500.19315-3-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-30block: Add BDRV_BLOCK_EOF to bdrv_get_block_status()Eric Blake
Just as the block layer already sets BDRV_BLOCK_ALLOCATED as a shortcut for subsequent operations, there are also some optimizations that are made easier if we can quickly tell that *pnum will advance us to the end of a file, via a new BDRV_BLOCK_EOF which gets set by the block layer. This just plumbs up the new bit; subsequent patches will make use of it. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170505021500.19315-2-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-26block: Do not strcmp() with NULL uri->schemeMax Reitz
uri_parse(...)->scheme may be NULL. In fact, probably every field may be NULL, and the callers do test this for all of the other fields but not for scheme (except for block/gluster.c; block/vxhs.c does not access that field at all). We can easily fix this by using g_strcmp0() instead of strcmp(). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170613205726.13544-1-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-06-26blkverify: Catch bs->exact_filename overflowMax Reitz
The bs->exact_filename field may not be sufficient to store the full blkverify node filename. In this case, we should not generate a filename at all instead of an unusable one. Cc: qemu-stable@nongnu.org Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170613172006.19685-3-mreitz@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-06-26blkdebug: Catch bs->exact_filename overflowMax Reitz
The bs->exact_filename field may not be sufficient to store the full blkdebug node filename. In this case, we should not generate a filename at all instead of an unusable one. Cc: qemu-stable@nongnu.org Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170613172006.19685-2-mreitz@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-06-26block: change variable names in BlockDriverStateManos Pitsidianakis
Change the 'int count' parameter in *pwrite_zeros, *pdiscard related functions (and some others) to 'int bytes', as they both refer to bytes. This helps with code legibility. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-06-26block: Remove bdrv_aio_readv/writev/flush()Kevin Wolf
These functions are unused now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Use bdrv_co_* for coroutine_fnsKevin Wolf
All functions that are marked coroutine_fn can directly call the bdrv_co_* version of functions instead of going through the wrapper. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Add coroutine_fn to I/O path functionsKevin Wolf
Now that we stay in coroutine context for the whole request when doing reads or writes, we can add coroutine_fn annotations to many functions that can do I/O or yield directly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Use a coroutine for need_check_timerKevin Wolf
This fixes the last place where we degraded from AIO to actual blocking synchronous I/O requests. Putting it into a coroutine means that instead of blocking, the coroutine simply yields while doing I/O. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Simplify request handlingKevin Wolf
Now that we process a request in the same coroutine from beginning to end and don't drop out of it any more, we can look like a proper coroutine-based driver and simply call qed_aio_next_io() and get a return value from it instead of spawning an additional coroutine that reenters the parent when it's done. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Use CoQueue for serialising allocationsKevin Wolf
Now that we're running in coroutine context, the ad-hoc serialisation code (which drops a request that has to wait out of coroutine context) can be replaced by a CoQueue. This means that when we resume a serialised request, it is running in coroutine context again and its I/O isn't blocking any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Implement .bdrv_co_readv/writevKevin Wolf
Most of the qed code is now synchronous and matches the coroutine model. One notable exception is the serialisation between requests which can still schedule a callback. Before we can replace this with coroutine locks, let's convert the driver's external interfaces to the coroutine versions. We need to be careful to handle both requests that call the completion callback directly from the calling coroutine (i.e. fully synchronous code) and requests that involve some callback, so that we need to yield and wait for the completion callback coming from outside the coroutine. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Remove recursion in qed_aio_next_io()Kevin Wolf
Instead of calling itself recursively as the last thing, just convert qed_aio_next_io() into a loop. This patch is best reviewed with 'git show -w' because most of it is just whitespace changes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Remove ret argument from qed_aio_next_io()Kevin Wolf
All callers pass ret = 0, so we can just remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Add return value to qed_aio_read/write_data()Kevin Wolf
Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but just return an error code and let the caller handle it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Add return value to qed_aio_write_inplace/alloc()Kevin Wolf
Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but just return an error code and let the caller handle it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Add return value to qed_aio_write_cow()Kevin Wolf
Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but just return an error code and let the caller handle it. While refactoring qed_aio_write_alloc() to accomodate the change, qed_aio_write_zero_cluster() ended up with a single line, so I chose to inline that line and remove the function completely. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Add return value to qed_aio_write_main()Kevin Wolf
Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but just return an error code and let the caller handle it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Add return value to qed_aio_write_l2_update()Kevin Wolf
Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but just return an error code and let the caller handle it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Add return value to qed_aio_write_l1_update()Kevin Wolf
Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but just return an error code and let the caller handle it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Inline qed_commit_l2_update()Kevin Wolf
qed_commit_l2_update() is unconditionally called at the end of qed_aio_write_l1_update(). Inline it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Make qed_aio_write_main() synchronousKevin Wolf
Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Make qed_aio_read_data() synchronousKevin Wolf
Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Remove callback from qed_write_table()Kevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Remove GenericCBKevin Wolf
The GenericCB infrastructure isn't used any more. Remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Make qed_write_table() synchronousKevin Wolf
Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Remove callback from qed_write_header()Kevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Make qed_write_header() synchronousKevin Wolf
Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Remove callback from qed_copy_from_backing_file()Kevin Wolf
With this change, qed_aio_write_prefill() and qed_aio_write_postfill() collapse into a single function. This is reflected by a rename of the combined function to qed_aio_write_cow(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Make qed_copy_from_backing_file() synchronousKevin Wolf
Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Make qed_read_backing_file() synchronousKevin Wolf
Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Remove callback from qed_find_cluster()Kevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Remove callback from qed_read_l2_table()Kevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Remove callback from qed_read_table()Kevin Wolf
Instead of passing the return value to a callback, return it to the caller so that the callback can be inlined there. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Make qed_read_table() synchronousKevin Wolf
Note that this code is generally not running in coroutine context, so this is an actual blocking synchronous operation. We'll fix this in a moment. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qed: Use bottom half to resume waiting requestsKevin Wolf
The qed driver serialises allocating write requests. When the active allocation is finished, the AIO callback is called, but after this, the next allocating request is immediately processed instead of leaving the coroutine. Resuming another allocation request in the same request coroutine means that the request now runs in the wrong coroutine. The following is one of the possible effects of this: The completed request will generally reenter its request coroutine in a bottom half, expecting that it completes the request in bdrv_driver_pwritev(). However, if the second request actually yielded before leaving the coroutine, the reused request coroutine is in an entirely different place and is reentered prematurely. Not a good idea. Let's make sure that we exit the coroutine after completing the first request by resuming the next allocating request only with a bottom half. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-06-26qcow2: Use offset_into_cluster() and offset_to_l2_index()Alberto Garcia
We already have functions for doing these calculations, so let's use them instead of doing everything by hand. This makes the code a bit more readable. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-06-26qcow2: Merge the writing of the COW regions with the guest dataAlberto Garcia
If the guest tries to write data that results on the allocation of a new cluster, instead of writing the guest data first and then the data from the COW regions, write everything together using one single I/O operation. This can improve the write performance by 25% or more, depending on several factors such as the media type, the cluster size and the I/O request size. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-06-26qcow2: Pass a QEMUIOVector to do_perform_cow_{read,write}()Alberto Garcia
Instead of passing a single buffer pointer to do_perform_cow_write(), pass a QEMUIOVector. This will allow us to merge the write requests for the COW regions and the actual data into a single one. Although do_perform_cow_read() does not strictly need to change its API, we're doing it here as well for consistency. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-06-26qcow2: Allow reading both COW regions with only one requestAlberto Garcia
Reading both COW regions requires two separate requests, but it's perfectly possible to merge them and perform only one. This generally improves performance, particularly on rotating disk drives. The downside is that the data in the middle region is read but discarded. This patch takes a conservative approach and only merges reads when the size of the middle region is <= 16KB. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-06-26qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()Alberto Garcia
This patch splits do_perform_cow() into three separate functions to read, encrypt and write the COW regions. perform_cow() can now read both regions first, then encrypt them and finally write them to disk. The memory allocation is also done in this function now, using one single buffer large enough to hold both regions. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-06-26qcow2: Make perform_cow() call do_perform_cow() twiceAlberto Garcia
Instead of calling perform_cow() twice with a different COW region each time, call it just once and make perform_cow() handle both regions. This patch simply moves code around. The next one will do the actual reordering of the COW operations. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-06-26qcow2: Use unsigned int for both members of Qcow2COWRegionAlberto Garcia
Qcow2COWRegion has two attributes: - The offset of the COW region from the start of the first cluster touched by the I/O request. Since it's always going to be positive and the maximum request size is at most INT_MAX, we can use a regular unsigned int to store this offset. - The size of the COW region in bytes. This is guaranteed to be >= 0, so we should use an unsigned type instead. In x86_64 this reduces the size of Qcow2COWRegion from 16 to 8 bytes. It will also help keep some assertions simpler now that we know that there are no negative numbers. The prototype of do_perform_cow() is also updated to reflect these changes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-06-26qcow2: Remove unused Error variable in do_perform_cow()Alberto Garcia
We are using the return value of qcow2_encrypt_sectors() to detect problems but we are throwing away the returned Error since we have no way to report it to the user. Therefore we can simply get rid of the local Error variable and pass NULL instead. Alternatively we could try to figure out a way to pass the original error instead of simply returning -EIO, but that would be more invasive, so let's keep the current approach. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-06-26throttle: Update throttle-groups.c documentationAlberto Garcia
There used to be throttle_timers_{detach,attach}_aio_context() calls in bdrv_set_aio_context(), but since 7ca7f0f6db1fedd28d490795d778cf239 they are now in blk_set_aio_context(). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-06-26block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()Stefan Hajnoczi
Calling aio_poll() directly may have been fine previously, but this is the future, man! The difference between an aio_poll() loop and BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext around aio_poll(). This allows the IOThread to run fd handlers or BHs to complete the request. Failure to release the AioContext causes deadlocks. Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object iothread. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-06-26block: count bdrv_co_rw_vmstate() requestsStefan Hajnoczi
Call bdrv_inc/dec_in_flight() for vmstate reads/writes. This seems unnecessary at first glance because vmstate reads/writes are done synchronously while the guest is stopped. But we need the bdrv_wakeup() in bdrv_dec_in_flight() so the main loop sees request completion. Besides, it's cleaner to count vmstate reads/writes like ordinary read/write requests. The bdrv_wakeup() partially fixes a 'savevm' hang with -object iothread. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-06-26commit: Fix completion with extra referenceKevin Wolf
commit_complete() can't assume that after its block_job_completed() the job is actually immediately freed; someone else may still be holding references. In this case, the op blockers on the intermediate nodes make the graph reconfiguration in the completion code fail. Call block_job_remove_all_bdrv() manually so that we know for sure that any blockers on intermediate nodes are given up. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>