aboutsummaryrefslogtreecommitdiff
path: root/include/block
AgeCommit message (Collapse)Author
2018-03-13Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into ↵Peter Maydell
staging # gpg: Signature made Mon 12 Mar 2018 16:01:16 GMT # gpg: using RSA key 9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: block: make BDRV_POLL_WHILE() re-entrancy safe Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-03-12block: make BDRV_POLL_WHILE() re-entrancy safeStefan Hajnoczi
Nested BDRV_POLL_WHILE() calls can occur. Currently assert(!wait_->wakeup) fails in AIO_WAIT_WHILE() when this happens. This patch converts the bool wait_->need_kick flag to an unsigned wait_->num_waiters counter. Nesting works correctly because outer AIO_WAIT_WHILE() callers evaluate the condition again after the inner caller completes (invoking the inner caller counts as aio_poll() progress). Reported-by: "fuweiwei (C)" <fuweiwei2@huawei.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180307124619.6218-1-stefanha@redhat.com Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-03-12Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches # gpg: Signature made Fri 09 Mar 2018 15:09:20 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (56 commits) qemu-iotests: fix 203 migration completion race iotests: Tweak 030 in order to trigger a race condition with parallel jobs iotests: Skip test for ENOMEM error iotests: Mark all tests executable iotests: Test creating overlay when guest running qemu-iotests: Test ssh image creation over QMP qemu-iotests: Test qcow2 over file image creation with QMP block: Fail bdrv_truncate() with negative size file-posix: Fix no-op bdrv_truncate() with falloc preallocation ssh: Support .bdrv_co_create ssh: Pass BlockdevOptionsSsh to connect_to_ssh() ssh: QAPIfy host-key-check option ssh: Use QAPI BlockdevOptionsSsh object sheepdog: Support .bdrv_co_create sheepdog: QAPIfy "redundancy" create option nfs: Support .bdrv_co_create nfs: Use QAPI options in nfs_client_open() rbd: Use qemu_rbd_connect() in qemu_rbd_do_create() rbd: Assign s->snap/image_name in qemu_rbd_open() rbd: Support .bdrv_co_create ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-03-09block: x-blockdev-create QMP commandKevin Wolf
This adds a synchronous x-blockdev-create QMP command that can create qcow2 images on a given node name. We don't want to block while creating an image, so this is not the final interface in all aspects, but BlockdevCreateOptionsQcow2 and .bdrv_co_create() are what they actually might look like in the end. In any case, this should be good enough to test whether we interpret BlockdevCreateOptions as we should. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09block: Make bdrv_is_whitelisted() publicKevin Wolf
We'll use a separate source file for image creation, and we need to check there whether the requested driver is whitelisted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09qcow2: Use BlockdevRef in qcow2_co_create()Kevin Wolf
Instead of passing a separate BlockDriverState* into qcow2_co_create(), make use of the BlockdevRef that is included in BlockdevCreateOptions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09block: convert bdrv_check callback to coroutine_fnPaolo Bonzini
Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1516279431-30424-8-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09block: convert bdrv_invalidate_cache callback to coroutine_fnPaolo Bonzini
QED's bdrv_invalidate_cache implementation would like to reuse functions that acquire/release the metadata locks. Call it from coroutine context to simplify the logic. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1516279431-30424-6-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-08block: add aio_wait_bh_oneshot()Stefan Hajnoczi
Sometimes it's necessary for the main loop thread to run a BH in an IOThread and wait for its completion. This primitive is useful during startup/shutdown to synchronize and avoid race conditions. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20180307144205.20619-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-03-06Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches # gpg: Signature made Mon 05 Mar 2018 17:45:51 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (38 commits) block: Fix NULL dereference on empty drive error qcow2: Replace align_offset() with ROUND_UP() block/ssh: Add basic .bdrv_truncate() block/ssh: Make ssh_grow_file() blocking block/ssh: Pull ssh_grow_file() from ssh_create() qemu-img: Make resize error message more general qcow2: make qcow2_co_create2() a coroutine_fn block: rename .bdrv_create() to .bdrv_co_create_opts() Revert "IDE: Do not flush empty CDROM drives" block: test blk_aio_flush() with blk->root == NULL block: add BlockBackend->in_flight counter block: extract AIO_WAIT_WHILE() from BlockDriverState aio: rename aio_context_in_iothread() to in_aio_context_home_thread() docs: document how to use the l2-cache-entry-size parameter specs/qcow2: Fix documentation of the compressed cluster descriptor iotest 033: add misaligned write-zeroes test via truncate block: fix write with zero flag set and iovector provided block: Drop unused .bdrv_co_get_block_status() vvfat: Switch to .bdrv_co_block_status() vpc: Switch to .bdrv_co_block_status() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # include/block/block.h
2018-03-02Include less of the generated modular QAPI headersMarkus Armbruster
In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-02block: rename .bdrv_create() to .bdrv_co_create_opts()Stefan Hajnoczi
BlockDriver->bdrv_create() has been called from coroutine context since commit 5b7e1542cfa41a281af9629d31cef03704d976e6 ("block: make bdrv_create adopt coroutine"). Make this explicit by renaming to .bdrv_co_create_opts() and add the coroutine_fn annotation. This makes it obvious to block driver authors that they may yield, use CoMutex, or other coroutine_fn APIs. bdrv_co_create is reserved for the QAPI-based version that Kevin is working on. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170705102231.20711-2-stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02block: extract AIO_WAIT_WHILE() from BlockDriverStateStefan Hajnoczi
BlockDriverState has the BDRV_POLL_WHILE() macro to wait on event loop activity while a condition evaluates to true. This is used to implement synchronous operations where it acts as a condvar between the IOThread running the operation and the main loop waiting for the operation. It can also be called from the thread that owns the AioContext and in that case it's just a nested event loop. BlockBackend needs this behavior but doesn't always have a BlockDriverState it can use. This patch extracts BDRV_POLL_WHILE() into the AioWait abstraction, which can be used with AioContext and isn't tied to BlockDriverState anymore. This feature could be built directly into AioContext but then all users would kick the event loop even if they signal different conditions. Imagine an AioContext with many BlockDriverStates, each time a request completes any waiter would wake up and re-check their condition. It's nicer to keep a separate AioWait object for each condition instead. Please see "block/aio-wait.h" for details on the API. The name AIO_WAIT_WHILE() avoids the confusion between AIO_POLL_WHILE() and AioContext polling. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02aio: rename aio_context_in_iothread() to in_aio_context_home_thread()Stefan Hajnoczi
The name aio_context_in_iothread() is misleading because it also returns true when called on the main AioContext from the main loop thread, which is not an IOThread. This patch renames it to in_aio_context_home_thread() and expands the doc comment to make the semantics clearer. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02block: Drop unused .bdrv_co_get_block_status()Eric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. Now that all drivers have been updated to provide the byte-based .bdrv_co_block_status(), we can delete the sector-based interface. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02block: Switch passthrough drivers to .bdrv_co_block_status()Eric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. Update the generic helpers, and all passthrough clients (blkdebug, commit, mirror, throttle) accordingly. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02block: Add .bdrv_co_block_status() callbackEric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. Now that the block layer exposes byte-based allocation, it's time to tackle the drivers. Add a new callback that operates on as small as byte boundaries. Subsequent patches will then update individual drivers, then finally remove .bdrv_co_get_block_status(). The new code also passes through the 'want_zero' hint, which will allow subsequent patches to further optimize callers that only care about how much of the image is allocated (want_zero is false), rather than full details about runs of zeroes and which offsets the allocation actually maps to (want_zero is true). As part of this effort, fix another part of the documentation: the claim in commit 4c41cb4 that BDRV_BLOCK_ALLOCATED is short for 'DATA || ZERO' is a lie at the block layer (see commit e88ae2264), even though it is how the bit is computed from the driver layer. After all, there are intentionally cases where we return ZERO but not ALLOCATED at the block layer, when we know that a read sees zero because the backing file is too short. Note that the driver interface is thus slightly different than the public interface with regards to which bits will be set, and what guarantees are provided on input. We also add an assertion that any driver using the new callback will make progress (the only time pnum will be 0 is if the block layer already handled an out-of-bounds request, or if there is an error); the old driver interface did not provide this guarantee, which could lead to some inf-loops in drastic corner-case failures. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-01nbd: BLOCK_STATUS constantsVladimir Sementsov-Ogievskiy
Expose the new constants and structs that will be used by both server and client implementations of NBD_CMD_BLOCK_STATUS (the command is currently experimental at https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md but will hopefully be stabilized soon). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1518702707-7077-4-git-send-email-vsementsov@virtuozzo.com> [eblake: split from larger patch on server implementation] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-01nbd: change indenting in nbd.hVladimir Sementsov-Ogievskiy
Prepared indenting for the following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1518702707-7077-3-git-send-email-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2018-02-13block: maintain persistent disabled bitmapsVladimir Sementsov-Ogievskiy
To maintain load/store disabled bitmap there is new approach: - deprecate @autoload flag of block-dirty-bitmap-add, make it ignored - store enabled bitmaps as "auto" to qcow2 - store disabled bitmaps without "auto" flag to qcow2 - on qcow2 open load "auto" bitmaps as enabled and others as disabled (except in_use bitmaps) Also, adjust iotests 165 and 176 appropriately. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180202160752.143796-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-09block: Simplify bdrv_can_write_zeroes_with_unmap()Eric Blake
We don't need the can_write_zeroes_with_unmap field in BlockDriverInfo, because it is redundant information with supported_zero_flags & BDRV_REQ_MAY_UNMAP. Note that BlockDriverInfo and supported_zero_flags are both per-device settings, rather than global state about the driver as a whole, which means one or both of these bits of information can already be conditional. Let's audit how they were set: crypto: always setting can_write_ to false is pointless (the struct starts life zero-initialized), no use of supported_ nbd: just recently fixed to set can_write_ if supported_ includes MAY_UNMAP (thus this commit effectively reverts bca80059e and solves the problem mentioned there in a more global way) file-posix, iscsi, qcow2: can_write_ is conditional, while supported_ was unconditional; but passing MAY_UNMAP would fail with ENOTSUP if the condition wasn't met qed: can_write_ is unconditional, but pwrite_zeroes lacks support for MAY_UNMAP and supported_ is not set. Perhaps support can be added later (since it would be similar to qcow2), but for now claiming false is no real loss all other drivers: can_write_ is not set, and supported_ is either unset or a passthrough Simplify the code by moving the conditional into supported_zero_flags for all drivers, then dropping the now-unused BDI field. For callers that relied on bdrv_can_write_zeroes_with_unmap(), we return the same per-device settings for drivers that had conditions (no observable change in behavior there); and can now return true (instead of false) for drivers that support passthrough (for example, the commit driver) which gives those drivers the same fix as nbd just got in bca80059e. For callers that relied on supported_zero_flags, we now have a few more places that can avoid a wasted call to pwrite_zeroes() that will just fail with ENOTSUP. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180126193439.20219-1-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-02-09Move include qemu/option.h from qemu-common.h to actual usersMarkus Armbruster
qemu-common.h includes qemu/option.h, but most places that include the former don't actually need the latter. Drop the include, and add it to the places that actually need it. While there, drop superfluous includes of both headers, and separate #include from file comment with a blank line. This cleanup makes the number of objects depending on qemu/option.h drop from 4545 (out of 4743) to 284 in my "build everything" tree. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-20-armbru@redhat.com> [Semantic conflict with commit bdd6a90a9e in block/nvme.c resolved]
2018-02-09Include qapi/qmp/qdict.h exactly where neededMarkus Armbruster
This cleanup makes the number of objects depending on qapi/qmp/qdict.h drop from 4550 (out of 4743) to 368 in my "build everything" tree. For qapi/qmp/qobject.h, the number drops from 4552 to 390. While there, separate #include from file comment with a blank line. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-13-armbru@redhat.com>
2018-02-09Include qapi/qmp/qobject.h exactly where neededMarkus Armbruster
Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-11-armbru@redhat.com>
2018-02-09Drop superfluous includes of qapi-types.h and test-qapi-types.hMarkus Armbruster
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-4-armbru@redhat.com>
2018-02-08block: Move NVMe constants to a separate headerFam Zheng
Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-8-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-02-08block: Introduce buf register APIFam Zheng
Allow block driver to map and unmap a buffer for later I/O, as a performance hint. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-5-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-01-26qapi: add nbd-server-removeVladimir Sementsov-Ogievskiy
Add command for removing an export. It is needed for cases when we don't want to keep the export after the operation on it was completed. The other example is a temporary node, created with blockdev-add. If we want to delete it we should firstly remove any corresponding NBD export. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180119135719.24745-3-vsementsov@virtuozzo.com> [eblake: drop dead nb_clients code] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-01-10nbd: rename nbd_option and nbd_opt_replyVladimir Sementsov-Ogievskiy
Rename nbd_option and nbd_opt_reply to NBDOption and NBDOptionReply to correspond to Qemu coding style and other structures here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20171122101958.17065-5-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-12-22block: Allow graph changes in subtree drained sectionKevin Wolf
We need to remember how many of the drain sections in which a node is were recursive (i.e. subtree drain rather than node drain), so that they can be correctly applied when children are added or removed during the drained section. With this change, it is safe to modify the graph even inside a bdrv_subtree_drained_begin/end() section. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22block: Add bdrv_subtree_drained_begin/end()Kevin Wolf
bdrv_drained_begin() waits for the completion of requests in the whole subtree, but it only actually keeps its immediate bs parameter quiesced until bdrv_drained_end(). Add a version that keeps the whole subtree drained. As of this commit, graph changes cannot be allowed during a subtree drained section, but this will be fixed soon. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22block: Don't notify parents in drain call chainKevin Wolf
This is in preparation for subtree drains, i.e. drained sections that affect not only a single node, but recursively all child nodes, too. Calling the parent callbacks for drain is pointless when we just came from that parent node recursively and leads to multiple increases of bs->quiesce_counter in a single drain call. Don't do it. In order for this to work correctly, the parent callback must be called for every bdrv_drain_begin/end() call, not only for the outermost one: If we have a node N with two parents A and B, recursive draining of A should cause the quiesce_counter of B to increase because its child N is drained independently of B. If now B is recursively drained, too, A must increase its quiesce_counter because N is drained independently of A only now, even if N is going from quiesce_counter 1 to 2. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22block: Remove unused bdrv_requests_pendingFam Zheng
Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-18hbitmap: add next_zero functionVladimir Sementsov-Ogievskiy
The function searches for next zero bit. Also add interface for BdrvDirtyBitmap and unit test. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20171012135313.227864-2-vsementsov@virtuozzo.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-11-29blockjob: reimplement block_job_sleep_ns to allow cancellationPaolo Bonzini
This reverts the effects of commit 4afeffc857 ("blockjob: do not allow coroutine double entry or entry-after-completion", 2017-11-21) This fixed the symptom of a bug rather than the root cause. Canceling the wait on a sleeping blockjob coroutine is generally fine, we just need to make it work correctly across AioContexts. To do so, use a QEMUTimer that calls block_job_enter. Use a mutex to ensure that block_job_enter synchronizes correctly with block_job_sleep_ns. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Tested-By: Jeff Cody <jcody@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-11-29blockjob: remove clock argument from block_job_sleep_nsPaolo Bonzini
All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to support more than one clock when block_job_sleep_ns switches to a single timer stored in the BlockJob struct. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Tested-By: Jeff Cody <jcody@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-11-21Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into stagingPeter Maydell
# gpg: Signature made Tue 21 Nov 2017 17:01:33 GMT # gpg: using RSA key 0xBDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: qemu-iotest: add test for blockjob coroutine race condition qemu-iotests: add option in common.qemu for mismatch only coroutine: abort if we try to schedule or enter a pending coroutine blockjob: do not allow coroutine double entry or entry-after-completion Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-11-21blockjob: do not allow coroutine double entry or entry-after-completionJeff Cody
When block_job_sleep_ns() is called, the co-routine is scheduled for future execution. If we allow the job to be re-entered prior to the scheduled time, we present a race condition in which a coroutine can be entered recursively, or even entered after the coroutine is deleted. The job->busy flag is used by blockjobs when a coroutine is busy executing. The function 'block_job_enter()' obeys the busy flag, and will not enter a coroutine if set. If we sleep a job, we need to leave the busy flag set, so that subsequent calls to block_job_enter() are prevented. This changes the prior behavior of block_job_cancel() being able to immediately wake up and cancel a job; in practice, this should not be an issue, as the coroutine sleep times are generally very small, and the cancel will occur the next time the coroutine wakes up. This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708 Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-11-21block: Add errp to bdrv_all_goto_snapshot()Kevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com>
2017-11-21block: Add errp to bdrv_snapshot_goto()Kevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com>
2017-11-17block: Make bdrv_next() keep strong referencesMax Reitz
On one hand, it is a good idea for bdrv_next() to return a strong reference because ideally nearly every pointer should be refcounted. This fixes intermittent failure of iotest 194. On the other, it is absolutely necessary for bdrv_next() itself to keep a strong reference to both the BB (in its first phase) and the BDS (at least in the second phase) because when called the next time, it will dereference those objects to get a link to the next one. Therefore, it needs these objects to stay around until then. Just storing the pointer to the next in the iterator is not really viable because that pointer might become invalid as well. Both arguments taken together means we should probably just invoke bdrv_ref() and blk_ref() in bdrv_next(). This means we have to assert that bdrv_next() is always called from the main loop, but that was probably necessary already before this patch and judging from the callers, it also looks to actually be the case. Keeping these strong references means however that callers need to give them up if they decide to abort the iteration early. They can do so through the new bdrv_next_cleanup() function. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110172545.32609-1-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-09nbd: Fix struct name for structured readsEric Blake
A closer read of the NBD spec shows that a structured reply chunk for a hole is not quite identical to the prefix of a data chunk, because the hole has to also send a 32-bit size field. Although we do not yet send holes, we should fix the misleading information in our header and make it easier for a future patch to support sparse reads. Messed up in commit bae245d1. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171108215703.9295-5-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-10-30nbd: Minimal structured read for clientVladimir Sementsov-Ogievskiy
Minimal implementation: for structured error only error_report error message. Note that test 83 is now more verbose, because the implementation prints more warnings about unexpected communication errors; perhaps future patches should tone things down by using trace messages instead of traces, but the common case of successful communication is no noisier than before. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171027104037.8319-13-eblake@redhat.com>
2017-10-30nbd: Move nbd_read() to common headerEric Blake
An upcoming change to block/nbd-client.c will want to read the tail of a structured reply chunk directly from the wire. Move this function to make it easier. Based on a patch from Vladimir Sementsov-Ogievskiy. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171027104037.8319-12-eblake@redhat.com>
2017-10-30nbd/client: prepare nbd_receive_reply for structured replyVladimir Sementsov-Ogievskiy
In following patch nbd_receive_reply will be used both for simple and structured reply header receiving. NBDReply is altered into union of simple reply header and structured reply chunk header, simple error translation moved to block/nbd-client to be consistent with further structured reply error translation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171027104037.8319-11-eblake@redhat.com>
2017-10-30nbd: Expose constants and structs for structured readEric Blake
Upcoming patches will implement the NBD structured reply extension [1] for both client and server roles. Declare the constants, structs, and lookup routines that will be valuable whether the server or client code is backported in isolation. This includes moving one constant from an internal header to the public header, as part of the structured read processing will be done in block/nbd-client.c rather than nbd/client.c. [1]https://github.com/NetworkBlockDevice/nbd/blob/extension-structured-reply/doc/proto.md Based on patches from Vladimir Sementsov-Ogievskiy. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171027104037.8319-4-eblake@redhat.com>
2017-10-30nbd: Move nbd_errno_to_system_errno() to public headerEric Blake
This is needed in preparation for structured reply handling, as we will be performing the translation from NBD error to system errno value higher in the stack at block/nbd-client.c. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20171027104037.8319-3-eblake@redhat.com>
2017-10-26block: Align block status requestsEric Blake
Any device that has request_alignment greater than 512 should be unable to report status at a finer granularity; it may also be simpler for such devices to be guaranteed that the block layer has rounded things out to the granularity boundary (the way the block layer already rounds all other I/O out). Besides, getting the code correct for super-sector alignment also benefits us for the fact that our public interface now has byte granularity, even though none of our drivers have byte-level callbacks. Add an assertion in blkdebug that proves that the block layer never requests status of unaligned sections, similar to what it does on other requests (while still keeping the generic helper in place for when future patches add a throttle driver). Note that iotest 177 already covers this (it would fail if you use just the blkdebug.c hunk without the io.c changes). Meanwhile, we can drop assertions in callers that no longer have to pass in sector-aligned addresses. There is a mid-function scope added for 'count' and 'longret', for a couple of reasons: first, an upcoming patch will add an 'if' statement that checks whether a driver has an old- or new-style callback, and can conveniently use the same scope for less indentation churn at that time. Second, since we are trying to get rid of sector-based computations, wrapping things in a scope makes it easier to group and see what will be deleted in a final cleanup patch once all drivers have been converted to the new-style callback. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26block: Convert bdrv_get_block_status_above() to bytesEric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the name of the function from bdrv_get_block_status_above() to bdrv_block_status_above() ensures that the compiler enforces that all callers are updated. Likewise, since it a byte interface allows an offset mapping that might not be sector aligned, split the mapping out of the return value and into a pass-by-reference parameter. For now, the io.c layer still assert()s that all uses are sector-aligned, but that can be relaxed when a later patch implements byte-based block status in the drivers. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_block_status(), plus updates for the new split return interface. But some code, particularly bdrv_block_status(), gets a lot simpler because it no longer has to mess with sectors. Likewise, mirror code no longer computes s->granularity >> BDRV_SECTOR_BITS, and can therefore drop an assertion about alignment because the loop no longer depends on alignment (never mind that we don't really have a driver that reports sub-sector alignments, so it's not really possible to test the effect of sub-sector mirroring). Fix a neighboring assertion to use is_power_of_2 while there. For ease of review, bdrv_get_block_status() was tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-26block: Convert bdrv_get_block_status() to bytesEric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the name of the function from bdrv_get_block_status() to bdrv_block_status() ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned, but that can be relaxed when a later patch implements byte-based block status in the drivers. There was an inherent limitation in returning the offset via the return value: we only have room for BDRV_BLOCK_OFFSET_MASK bits, which means an offset can only be mapped for sector-aligned queries (or, if we declare that non-aligned input is at the same relative position modulo 512 of the answer), so the new interface also changes things to return the offset via output through a parameter by reference rather than mashed into the return value. We'll have some glue code that munges between the two styles until we finish converting all uses. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_block_status(), coupled with the tweak in calling convention. But some code, particularly bdrv_is_allocated(), gets a lot simpler because it no longer has to mess with sectors. For ease of review, bdrv_get_block_status_above() will be tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>