aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2023-06-05bulk: Remove pointless QOM castsPhilippe Mathieu-Daudé
Mechanical change running Coccinelle spatch with content generated from the qom-cast-macro-clean-cocci-gen.py added in the previous commit. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230601093452.38972-3-philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-06-02cutils: Adjust signature of parse_uint[_full]Eric Blake
It's already confusing that we have two very similar functions for wrapping the parse of a 64-bit unsigned value, differing mainly on whether they permit leading '-'. Adjust the signature of parse_uint() and parse_uint_full() to be like all of qemu_strto*(): put the result parameter last, use the same types (uint64_t and unsigned long long have the same width, but are not always the same type), and mark endptr const (this latter change only affects the rare caller of parse_uint). Adjust all callers in the tree. While at it, note that since cutils.c already includes: QEMU_BUILD_BUG_ON(sizeof(int64_t) != sizeof(long long)); we are guaranteed that the result of parse_uint* cannot exceed UINT64_MAX (or the build would have failed), so we can drop pre-existing dead comparisons in opts-visitor.c that were never false. Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Message-Id: <20230522190441.64278-8-eblake@redhat.com> [eblake: Drop dead code spotted by Markus] Signed-off-by: Eric Blake <eblake@redhat.com>
2023-06-01block/blkio: use qemu_open() to support fd passing for virtio-blkStefano Garzarella
Some virtio-blk drivers (e.g. virtio-blk-vhost-vdpa) supports the fd passing. Let's expose this to the user, so the management layer can pass the file descriptor of an already opened path. If the libblkio virtio-blk driver supports fd passing, let's always use qemu_open() to open the `path`, so we can handle fd passing from the management layer through the "/dev/fdset/N" special path. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20230530071941.8954-2-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01block: remove bdrv_co_io_plug() APIStefan Hajnoczi
No block driver implements .bdrv_co_io_plug() anymore. Get rid of the function pointers. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20230530180959.1108766-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01block/linux-aio: convert to blk_io_plug_call() APIStefan Hajnoczi
Stop using the .bdrv_co_io_plug() API because it is not multi-queue block layer friendly. Use the new blk_io_plug_call() API to batch I/O submission instead. Note that a dev_max_batch check is dropped in laio_io_unplug() because the semantics of unplug_fn() are different from .bdrv_co_unplug(): 1. unplug_fn() is only called when the last blk_io_unplug() call occurs, not every time blk_io_unplug() is called. 2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no way to get per-BlockDriverState fields like dev_max_batch. Therefore this condition cannot be moved to laio_unplug_fn(). It is not obvious that this condition affects performance in practice, so I am removing it instead of trying to come up with a more complex mechanism to preserve the condition. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20230530180959.1108766-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01block/io_uring: convert to blk_io_plug_call() APIStefan Hajnoczi
Stop using the .bdrv_co_io_plug() API because it is not multi-queue block layer friendly. Use the new blk_io_plug_call() API to batch I/O submission instead. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20230530180959.1108766-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01block/blkio: convert to blk_io_plug_call() APIStefan Hajnoczi
Stop using the .bdrv_co_io_plug() API because it is not multi-queue block layer friendly. Use the new blk_io_plug_call() API to batch I/O submission instead. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20230530180959.1108766-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01block/nvme: convert to blk_io_plug_call() APIStefan Hajnoczi
Stop using the .bdrv_co_io_plug() API because it is not multi-queue block layer friendly. Use the new blk_io_plug_call() API to batch I/O submission instead. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20230530180959.1108766-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01block: add blk_io_plug_call() APIStefan Hajnoczi
Introduce a new API for thread-local blk_io_plug() that does not traverse the block graph. The goal is to make blk_io_plug() multi-queue friendly. Instead of having block drivers track whether or not we're in a plugged section, provide an API that allows them to defer a function call until we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is called multiple times with the same fn/opaque pair, then fn() is only called once at the end of the function - resulting in batching. This patch introduces the API and changes blk_io_plug()/blk_io_unplug(). blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument because the plug state is now thread-local. Later patches convert block drivers to blk_io_plug_call() and then we can finally remove .bdrv_co_io_plug() once all block drivers have been converted. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20230530180959.1108766-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-30aio: remove aio_disable_external() APIStefan Hajnoczi
All callers now pass is_external=false to aio_set_fd_handler() and aio_set_event_notifier(). The aio_disable_external() API that temporarily disables fd handlers that were registered is_external=true is therefore dead code. Remove aio_disable_external(), aio_enable_external(), and the is_external arguments to aio_set_fd_handler() and aio_set_event_notifier(). The entire test-fdmon-epoll test is removed because its sole purpose was testing aio_disable_external(). Parts of this patch were generated using the following coccinelle (https://coccinelle.lip6.fr/) semantic patch: @@ expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque; @@ - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque) + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque) @@ expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready; @@ - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready) + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready) Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230516190238.8401-21-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block/fuse: do not set is_external=true on FUSE fdStefan Hajnoczi
This is part of ongoing work to remove the aio_disable_external() API. Use BlockDevOps .drained_begin/end/poll() instead of aio_set_fd_handler(is_external=true). As a side-effect the FUSE export now follows AioContext changes like the other export types. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-16-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block/export: don't require AioContext lock around blk_exp_ref/unref()Stefan Hajnoczi
The FUSE export calls blk_exp_ref/unref() without the AioContext lock. Instead of fixing the FUSE export, adjust blk_exp_ref/unref() so they work without the AioContext lock. This way it's less error-prone. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-15-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block/export: rewrite vduse-blk drain codeStefan Hajnoczi
vduse_blk_detach_ctx() waits for in-flight requests using AIO_WAIT_WHILE(). This is not allowed according to a comment in bdrv_set_aio_context_commit(): /* * Take the old AioContex when detaching it from bs. * At this point, new_context lock is already acquired, and we are now * also taking old_context. This is safe as long as bdrv_detach_aio_context * does not call AIO_POLL_WHILE(). */ Use this opportunity to rewrite the drain code in vduse-blk: - Use the BlockExport refcount so that vduse_blk_exp_delete() is only called when there are no more requests in flight. - Implement .drained_poll() so in-flight request coroutines are stopped by the time .bdrv_detach_aio_context() is called. - Remove AIO_WAIT_WHILE() from vduse_blk_detach_ctx() to solve the .bdrv_detach_aio_context() constraint violation. It's no longer needed due to the previous changes. - Always handle the VDUSE file descriptor, even in drained sections. The VDUSE file descriptor doesn't submit I/O, so it's safe to handle it in drained sections. This ensures that the VDUSE kernel code gets a fast response. - Suspend virtqueue fd handlers in .drained_begin() and resume them in .drained_end(). This eliminates the need for the aio_set_fd_handler(is_external=true) flag, which is being removed from QEMU. This is a long list but splitting it into individual commits would probably lead to git bisect failures - the changes are all related. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-14-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block: drain from main loop thread in bdrv_co_yield_to_drain()Stefan Hajnoczi
For simplicity, always run BlockDevOps .drained_begin/end/poll() callbacks in the main loop thread. This makes it easier to implement the callbacks and avoids extra locks. Move the function pointer declarations from the I/O Code section to the Global State section for BlockDevOps, BdrvChildClass, and BlockDriver. Narrow IO_OR_GS_CODE() to GLOBAL_STATE_CODE() where appropriate. The test-bdrv-drain test case calls bdrv_drain() from an IOThread. This is now only allowed from coroutine context, so update the test case to run in a coroutine. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-11-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block: add blk_in_drain() APIStefan Hajnoczi
The BlockBackend quiesce_counter is greater than zero during drained sections. Add an API to check whether the BlockBackend is in a drained section. The next patch will use this API. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-10-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block/export: stop using is_external in vhost-user-blk serverStefan Hajnoczi
vhost-user activity must be suspended during bdrv_drained_begin/end(). This prevents new requests from interfering with whatever is happening in the drained section. Previously this was done using aio_set_fd_handler()'s is_external argument. In a multi-queue block layer world the aio_disable_external() API cannot be used since multiple AioContext may be processing I/O, not just one. Switch to BlockDevOps->drained_begin/end() callbacks. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-8-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block/export: wait for vhost-user-blk requests when drainingStefan Hajnoczi
Each vhost-user-blk request runs in a coroutine. When the BlockBackend enters a drained section we need to enter a quiescent state. Currently any in-flight requests race with bdrv_drained_begin() because it is unaware of vhost-user-blk requests. When blk_co_preadv/pwritev()/etc returns it wakes the bdrv_drained_begin() thread but vhost-user-blk request processing has not yet finished. The request coroutine continues executing while the main loop thread thinks it is in a drained section. One example where this is unsafe is for blk_set_aio_context() where bdrv_drained_begin() is called before .aio_context_detached() and .aio_context_attach(). If request coroutines are still running after bdrv_drained_begin(), then the AioContext could change underneath them and they race with new requests processed in the new AioContext. This could lead to virtqueue corruption, for example. (This example is theoretical, I came across this while reading the code and have not tried to reproduce it.) It's easy to make bdrv_drained_begin() wait for in-flight requests: add a .drained_poll() callback that checks the VuServer's in-flight counter. VuServer just needs an API that returns true when there are requests in flight. The in-flight counter needs to be atomic. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-7-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30util/vhost-user-server: rename refcount to in_flight counterStefan Hajnoczi
The VuServer object has a refcount field and ref/unref APIs. The name is confusing because it's actually an in-flight request counter instead of a refcount. Normally a refcount destroys the object upon reaching zero. The VuServer counter is used to wake up the vhost-user coroutine when there are no more requests. Avoid confusing by renaming refcount and ref/unref to in_flight and inc/dec. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-6-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block-backend: split blk_do_set_aio_context()Stefan Hajnoczi
blk_set_aio_context() is not fully transactional because blk_do_set_aio_context() updates blk->ctx outside the transaction. Most of the time this goes unnoticed but a BlockDevOps.drained_end() callback that invokes blk_get_aio_context() fails assert(ctx == blk->ctx). This happens because blk->ctx is only assigned after BlockDevOps.drained_end() is called and we're in an intermediate state where BlockDrvierState nodes already have the new context and the BlockBackend still has the old context. Making blk_set_aio_context() fully transactional solves this assertion failure because the BlockBackend's context is updated as part of the transaction (before BlockDevOps.drained_end() is called). Split blk_do_set_aio_context() in order to solve this assertion failure. This helper function actually serves two different purposes: 1. It drives blk_set_aio_context(). 2. It responds to BdrvChildClass->change_aio_ctx(). Get rid of the helper function. Do #1 inside blk_set_aio_context() and do #2 inside blk_root_set_aio_ctx_commit(). This simplifies the code. The only drawback of the fully transactional approach is that blk_set_aio_context() must contend with blk_root_set_aio_ctx_commit() being invoked as part of the AioContext change propagation. This can be solved by temporarily setting blk->allow_aio_context_change to true. Future patches call blk_get_aio_context() from BlockDevOps->drained_end(), so this patch will become necessary. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-2-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30copy-before-write: Fix open with child in iothreadKevin Wolf
The AioContext lock must not be held for bdrv_open_child(), but it is necessary for the following operations, in particular those using nested event loops in coroutine wrappers. Temporarily dropping the main AioContext lock is not necessary because we know we run in the main thread. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-9-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30raw-format: Fix open with 'file' in iothreadKevin Wolf
When opening the 'file' child moves bs to an iothread, we need to hold the AioContext lock of it before we can call raw_apply_options() (and more specifically, bdrv_getlength() inside of it). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-8-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30qcow2: Fix open with 'file' in iothreadKevin Wolf
qcow2_open() doesn't work correctly when opening the 'file' child moves bs to an iothread, for several reasons: - It uses BDRV_POLL_WHILE() to wait for the qcow2_open_entry() coroutine, which involves dropping the AioContext lock for bs when it is not in the main context - but we don't hold it, so this crashes. - It runs the qcow2_open_entry() coroutine in the current thread instead of the new AioContext of bs. - qcow2_open_entry() doesn't notify the main loop when it's done. This patches fixes these issues around delegating work to a coroutine. Temporarily dropping the main AioContext lock is not necessary because we know we run in the main thread. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-7-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30mirror: Hold main AioContext lock for calling bdrv_open_backing_file()Kevin Wolf
bdrv_open_backing_file() calls bdrv_open_inherit(), so all callers must hold the main AioContext lock. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-6-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block-backend: Fix blk_new_open() for iothreadsKevin Wolf
This fixes blk_new_open() to not assume that bs is in the main context. In particular, the BlockBackend must be created with the right AioContext because it will refuse to move to a different context afterwards. (blk->allow_aio_context_change is false.) Use this opportunity to use blk_insert_bs() instead of duplicating the bdrv_root_attach_child() call. This is consistent with what blk_new_with_bs() does. Add comments to document the locking rules. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-5-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block: Take main AioContext lock when calling bdrv_open()Kevin Wolf
The function documentation already says that all callers must hold the main AioContext lock, but not all of them do. This can cause assertion failures when functions called by bdrv_open() try to drop the lock. Fix a few more callers to take the lock before calling bdrv_open(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-4-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block-coroutine-wrapper: Take AioContext lock in no_co_wrappersKevin Wolf
All of the functions that currently take a BlockDriverState, BdrvChild or BlockBackend as their first parameter expect the associated AioContext to be locked when they are called. In the case of no_co_wrappers, they are called from bottom halves directly in the main loop, so no other caller can be expected to take the lock for them. This can result in assertion failures because a lock that isn't taken is released in nested event loops. Looking at the first parameter is already done by co_wrappers to decide where the coroutine should run, so doing the same in no_co_wrappers is only consistent. Take the lock in the generated bottom halves to fix the problem. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-2-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19graph-lock: Disable locking for nowKevin Wolf
In QEMU 8.0, we've been seeing deadlocks in bdrv_graph_wrlock(). They come from callers that hold an AioContext lock, which is not allowed during polling. In theory, we could temporarily release the lock, but callers are inconsistent about whether they hold a lock, and if they do, some are also confused about which one they hold. While all of this is fixable, it's not trivial, and the best course of action for 8.0.1 is probably just disabling the graph locking code temporarily. We don't currently rely on graph locking yet. It is supposed to replace the AioContext lock eventually to enable multiqueue support, but as long as we still have the AioContext lock, it is sufficient without the graph lock. Once the AioContext lock goes away, the deadlock doesn't exist any more either and this commit can be reverted. (Of course, it can also be reverted while the AioContext lock still exists if the callers have been fixed.) Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230517152834.277483-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19graph-lock: Honour read locks even in the main threadKevin Wolf
There are some conditions under which we don't actually need to do anything for taking a reader lock: Writing the graph is only possible from the main context while holding the BQL. So if a reader is running in the main context under the BQL and knows that it won't be interrupted until the next writer runs, we don't actually need to do anything. This is the case if the reader code neither has a nested event loop (this is forbidden anyway while you hold the lock) nor is a coroutine (because a writer could run when the coroutine has yielded). These conditions are exactly what bdrv_graph_rdlock_main_loop() asserts. They are not fulfilled in bdrv_graph_co_rdlock(), which always runs in a coroutine. This deletes the shortcuts in bdrv_graph_co_rdlock() that skip taking the reader lock in the main thread. Reported-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-9-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19blockjob: Adhere to rate limit even when reentered earlyKevin Wolf
When jobs are sleeping, for example to enforce a given rate limit, they can be reentered early, in particular in order to get paused, to update the rate limit or to get cancelled. Before this patch, they behave in this case as if they had fully completed their rate limiting delay. This means that requests are sped up beyond their limit, violating the constraints that the user gave us. Change the block jobs to sleep in a loop until the necessary delay is completed, while still allowing cancelling them immediately as well pausing (handled by the pause point in job_sleep_ns()) and updating the rate limit. This change is also motivated by iotests cases being prone to fail because drain operations pause and unpause them so often that block jobs complete earlier than they are supposed to. In particular, the next commit would fail iotests 030 without this change. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-8-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19qcow2: Unlock the graph in qcow2_do_open() where necessaryKevin Wolf
qcow2_do_open() calls a few no_co_wrappers that wrap functions taking the graph lock internally as a writer. Therefore, it can't hold the reader lock across these calls, it causes deadlocks. Drop the lock temporarily around the calls. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-4-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19block/export: Fix null pointer dereference in error pathKevin Wolf
There are some error paths in blk_exp_add() that jump to 'fail:' before 'exp' is even created. So we can't just unconditionally access exp->blk. Add a NULL check, and switch from exp->blk to blk, which is available earlier, just to be extra sure that we really cover all cases where BlockDevOps could have been set for it (in practice, this only happens in drv->create() today, so this part of the change isn't strictly necessary). Fixes: Coverity CID 1509238 Fixes: de79b52604e43fdeba6cee4f5af600b62169f2d2 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-3-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19block: Call .bdrv_co_create(_opts) unlockedKevin Wolf
These are functions that modify the graph, so they must be able to take a writer lock. This is impossible if they already hold the reader lock. If they need a reader lock for some of their operations, they should take it internally. Many of them go through blk_*(), which will always take the lock itself. Direct calls of bdrv_*() need to take the reader lock. Note that while locking for bdrv_co_*() calls is checked by TSA, this is not the case for the mixed_coroutine_fns bdrv_*(). Holding the lock is still required when they are called from coroutine context like here! This effectively reverts 4ec8df0183, but adds some internal locking instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-15block: add accounting for zone append operationSam Li
Taking account of the new zone append write operation for zoned devices, BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read, write, flush). Signed-off-by: Sam Li <faithilikerun@gmail.com> Message-id: 20230508051916.178322-3-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block: add some trace events for zone appendSam Li
Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508051510.177850-5-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block: introduce zone append write for zoned devicesSam Li
A zone append command is a write operation that specifies the first logical block of a zone as the write position. When writing to a zoned block device using zone append, the byte offset of the call may point at any position within the zone to which the data is being appended. Upon completion the device will respond with the position where the data has been written in the zone. Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508051510.177850-3-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15file-posix: add tracking of the zone write pointersSam Li
Since Linux doesn't have a user API to issue zone append operations to zoned devices from user space, the file-posix driver is modified to add zone append emulation using regular writes. To do this, the file-posix driver tracks the wp location of all zones of the device. It uses an array of uint64_t. The most significant bit of each wp location indicates if the zone type is conventional zones. The zones wp can be changed due to the following operations issued: - zone reset: change the wp to the start offset of that zone - zone finish: change to the end location of that zone - write to a zone - zone append Signed-off-by: Sam Li <faithilikerun@gmail.com> Message-id: 20230508051510.177850-2-faithilikerun@gmail.com [Fix errno propagation from handle_aiocb_zone_mgmt() --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block: add some trace events for new block layer APIsSam Li
Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508045533.175575-8-faithilikerun@gmail.com Message-id: 20230324090605.28361-8-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block: add zoned BlockDriver check to block layerSam Li
Putting zoned/non-zoned BlockDrivers on top of each other is not allowed. Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508045533.175575-6-faithilikerun@gmail.com Message-id: 20230324090605.28361-6-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé <philmd@linaro.org> and clarify that the check is about zoned BlockDrivers. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block/raw-format: add zone operations to pass through requestsSam Li
raw-format driver usually sits on top of file-posix driver. It needs to pass through requests of zone commands. Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508045533.175575-5-faithilikerun@gmail.com Message-id: 20230324090605.28361-5-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé <philmd@linaro.org>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block/block-backend: add block layer APIs resembling Linux ZonedBlockDevice ↵Sam Li
ioctls Add zoned device option to host_device BlockDriver. It will be presented only for zoned host block devices. By adding zone management operations to the host_block_device BlockDriver, users can use the new block layer APIs including Report Zone and four zone management operations (open, close, finish, reset, reset_all). Qemu-io uses the new APIs to perform zoned storage commands of the device: zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs), zone_finish(zf). For example, to test zone_report, use following command: $ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0 -c "zrp offset nr_zones" Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508045533.175575-4-faithilikerun@gmail.com Message-id: 20230324090605.28361-4-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé <philmd@linaro.org> and remove spurious ret = -errno in raw_co_zone_mgmt(). --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block/file-posix: introduce helper functions for sysfs attributesSam Li
Use get_sysfs_str_val() to get the string value of device zoned model. Then get_sysfs_zoned_model() can convert it to BlockZoneModel type of QEMU. Use get_sysfs_long_val() to get the long value of zoned device information. Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508045533.175575-3-faithilikerun@gmail.com Message-id: 20230324090605.28361-3-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé <philmd@linaro.org>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-10block/meson.build: prefer positive condition for replicationVladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Message-Id: <20230428194928.1426370-2-vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-10block: compile out assert_bdrv_graph_readable() by defaultStefan Hajnoczi
reader_count() is a performance bottleneck because the global aio_context_list_lock mutex causes thread contention. Put this debugging assertion behind a new ./configure --enable-debug-graph-lock option and disable it by default. The --enable-debug-graph-lock option is also enabled by the more general --enable-debug option. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230501173443.153062-1-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Mark bdrv_refresh_limits() and callers GRAPH_RDLOCKKevin Wolf
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_refresh_limits() need to hold a reader lock for the graph because it accesses the children list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-21-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Mark bdrv_recurse_can_replace() and callers GRAPH_RDLOCKKevin Wolf
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_recurse_can_replace() need to hold a reader lock for the graph because it accesses the children list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-20-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Mark bdrv_query_bds_stats() and callers GRAPH_RDLOCKKevin Wolf
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_query_bds_stats() need to hold a reader lock for the graph because it accesses the children list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-18-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Mark BlockDriver callbacks for amend job GRAPH_RDLOCKEmanuele Giuseppe Esposito
This adds GRAPH_RDLOCK annotations to declare that callers of amend callbacks in BlockDriver need to hold a reader lock for the graph. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-17-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Mark bdrv_co_get_info() and callers GRAPH_RDLOCKEmanuele Giuseppe Esposito
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_get_info() need to hold a reader lock for the graph. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-15-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Mark bdrv_co_get_allocated_file_size() and callers GRAPH_RDLOCKEmanuele Giuseppe Esposito
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_get_allocated_file_size() need to hold a reader lock for the graph. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20230504115750.54437-14-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10mirror: Require GRAPH_RDLOCK for accessing a node's parent listKevin Wolf
This adds GRAPH_RDLOCK annotations to declare that functions accessing the parent list of a node need to hold a reader lock for the graph. As it happens, they already do. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230504115750.54437-13-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>