aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2021-06-28Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into ↵Peter Maydell
staging * Some Meson test conversions * KVM dirty page ring buffer fix * KVM TSC scaling support * Fixes for SG_IO with /dev/sdX devices * (Non)support for host devices on iOS * -smp cleanups # gpg: Signature made Fri 25 Jun 2021 15:16:18 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (28 commits) machine: reject -smp dies!=1 for non-PC machines machine: pass QAPI struct to mc->smp_parse machine: add error propagation to mc->smp_parse machine: move common smp_parse code to caller machine: move dies from X86MachineState to CpuTopology file-posix: handle EINTR during ioctl block: detect DKIOCGETBLOCKCOUNT/SIZE before use block: try BSD disk size ioctls one after another block: check for sys/disk.h block: feature detection for host block support file-posix: try BLKSECTGET on block devices too, do not round to power of 2 block: add max_hw_transfer to BlockLimits block-backend: align max_transfer to request alignment osdep: provide ROUND_DOWN macro scsi-generic: pass max_segments via max_iov field in BlockLimits file-posix: fix max_iov for /dev/sg devices KVM: Fix dirty ring mmap incorrect size due to renaming accident configure, meson: convert libusbredir detection to meson configure, meson: convert libcacard detection to meson configure, meson: convert libusb detection to meson ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-28Merge remote-tracking branch 'remotes/vsementsov/tags/pull-jobs-2021-06-25' ↵Peter Maydell
into staging block: Make block-copy API thread-safe # gpg: Signature made Fri 25 Jun 2021 13:40:24 BST # gpg: using RSA key 8B9C26CDB2FD147C880E86A1561F24C1F19F79FB # gpg: Good signature from "Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 8B9C 26CD B2FD 147C 880E 86A1 561F 24C1 F19F 79FB * remotes/vsementsov/tags/pull-jobs-2021-06-25: block-copy: atomic .cancelled and .finished fields in BlockCopyCallState block-copy: add CoMutex lock block-copy: move progress_set_remaining in block_copy_task_end block-copy: streamline choice of copy_range vs. read/write block-copy: small refactor in block_copy_task_entry and block_copy_common co-shared-resource: protect with a mutex progressmeter: protect with a mutex blockjob: let ratelimit handle a speed of 0 block-copy: let ratelimit handle a speed of 0 ratelimit: treat zero speed as unlimited Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-25block-copy: atomic .cancelled and .finished fields in BlockCopyCallStateEmanuele Giuseppe Esposito
By adding acquire/release pairs, we ensure that .ret and .error_is_read fields are written by block_copy_dirty_clusters before .finished is true, and that they are read by API user after .finished is true. The atomic here are necessary because the fields are concurrently modified in coroutines, and read outside. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-6-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25block-copy: add CoMutex lockEmanuele Giuseppe Esposito
Group various structures fields, to better understand what we need to protect with a lock and what doesn't need it. Then, add a CoMutex to protect concurrent access of block-copy data structures. This mutex also protects .copy_bitmap, because its thread-safe API does not prevent it from assigning two tasks to the same bitmap region. Exceptions to the lock: - .sleep_state is handled in the series "coroutine: new sleep/wake API" and thus here left as TODO. - .finished, .cancelled and reads to .ret and .error_is_read will be protected in the following patch, because are used also outside coroutines. - .skip_unallocated is atomic. Including it under the mutex would increase the critical sections and make them also much more complex. We can have it as atomic since it is only written from outside and read by block-copy coroutines. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-5-eesposit@redhat.com> [vsementsov: fix typo in comment] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25block-copy: move progress_set_remaining in block_copy_task_endEmanuele Giuseppe Esposito
Moving this function in task_end ensures to update the progress anyways, even if there is an error. It also helps in next patch, allowing task_end to have only one critical section. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-4-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25block-copy: streamline choice of copy_range vs. read/writePaolo Bonzini
Put the logic to determine the copy size in a separate function, so that there is a simple state machine for the possible methods of copying data from one BlockDriverState to the other. Use .method instead of .copy_range as in-out argument, and include also .zeroes as an additional copy method. While at it, store the common computation of block_copy_max_transfer into a new field of BlockCopyState, and make sure that we always obey max_transfer; that's more efficient even for the COPY_RANGE_READ_WRITE case. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210624072043.180494-3-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25block-copy: small refactor in block_copy_task_entry and block_copy_commonEmanuele Giuseppe Esposito
Use a local variable instead of referencing BlockCopyState through a BlockCopyCallState or BlockCopyTask every time. This is in preparation for next patches. No functional change intended. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-2-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25progressmeter: protect with a mutexEmanuele Giuseppe Esposito
Progressmeter is protected by the AioContext mutex, which is taken by the block jobs and their caller (like blockdev). We would like to remove the dependency of block layer code on the AioContext mutex, since most drivers and the core I/O code are already not relying on it. Create a new C file to implement the ProgressMeter API, but keep the struct as public, to avoid forcing allocation on the heap. Also add a mutex to be able to provide an accurate snapshot of the progress values to the caller. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210614081130.22134-5-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25block-copy: let ratelimit handle a speed of 0Paolo Bonzini
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210614081130.22134-3-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25file-posix: handle EINTR during ioctlPaolo Bonzini
Similar to other handle_aiocb_* functions, handle_aiocb_ioctl needs to cater for the possibility that ioctl is interrupted by a signal. Otherwise, the I/O is incorrectly reported as a failure to the guest. Reported-by: Gordon Watson <gwatson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25block: detect DKIOCGETBLOCKCOUNT/SIZE before useJoelle van Dyne
iOS hosts do not have these defined so we fallback to the default behaviour. Co-authored-by: Warner Losh <imp@bsdimp.com> Signed-off-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25block: try BSD disk size ioctls one after anotherPaolo Bonzini
Try all the possible ioctls for disk size as long as they are supported, to keep the #if ladder simple. Extracted and cleaned up from a patch by Joelle van Dyne and Warner Losh. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25block: feature detection for host block supportJoelle van Dyne
On Darwin (iOS), there are no system level APIs for directly accessing host block devices. We detect this at configure time. Signed-off-by: Joelle van Dyne <j@getutm.app> Message-Id: <20210315180341.31638-2-j@getutm.app> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25file-posix: try BLKSECTGET on block devices too, do not round to power of 2Paolo Bonzini
bs->sg is only true for character devices, but block devices can also be used with scsi-block and scsi-generic. Unfortunately BLKSECTGET returns bytes in an int for /dev/sgN devices, and sectors in a short for block devices, so account for that in the code. The maximum transfer also need not be a power of 2 (for example I have seen disks with 1280 KiB maximum transfer) so there's no need to pass the result through pow2floor. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25block: add max_hw_transfer to BlockLimitsPaolo Bonzini
For block host devices, I/O can happen through either the kernel file descriptor I/O system calls (preadv/pwritev, io_submit, io_uring) or the SCSI passthrough ioctl SG_IO. In the latter case, the size of each transfer can be limited by the HBA, while for file descriptor I/O the kernel is able to split and merge I/O in smaller pieces as needed. Applying the HBA limits to file descriptor I/O results in more system calls and suboptimal performance, so this patch splits the max_transfer limit in two: max_transfer remains valid and is used in general, while max_hw_transfer is limited to the maximum hardware size. max_hw_transfer can then be included by the scsi-generic driver in the block limits page, to ensure that the stricter hardware limit is used. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25block-backend: align max_transfer to request alignmentPaolo Bonzini
Block device requests must be aligned to bs->bl.request_alignment. It makes sense for drivers to align bs->bl.max_transfer the same way; however when there is no specified limit, blk_get_max_transfer just returns INT_MAX. Since the contract of the function does not specify that INT_MAX means "no maximum", just align the outcome of the function (whether INT_MAX or bs->bl.max_transfer) before returning it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25scsi-generic: pass max_segments via max_iov field in BlockLimitsPaolo Bonzini
I/O to a disk via read/write is not limited by the number of segments allowed by the host adapter; the kernel can split requests if needed, and the limit imposed by the host adapter can be very low (256k or so) to avoid that SG_IO returns EINVAL if memory is heavily fragmented. Since this value is only interesting for SG_IO-based I/O, do not include it in the max_transfer and only take it into account when patching the block limits VPD page in the scsi-generic device. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2021-06-25file-posix: fix max_iov for /dev/sg devicesPaolo Bonzini
Even though it was only called for devices that have bs->sg set (which must be character devices), sg_get_max_segments looked at /sys/dev/block which only works for block devices. On Linux the sg driver has its own way to provide the maximum number of iovecs in a scatter/gather list, so add support for it. The block device path is kept because it will be reinstated in the next patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2021-06-24block/snapshot: Clarify goto fallback behaviorMax Reitz
In the bdrv_snapshot_goto() fallback code, we work with a pointer to either bs->file or bs->backing. We detach that child, close the node (with .bdrv_close()), apply the snapshot on the child node, and then re-open the node (with .bdrv_open()). In order for .bdrv_open() to attach the same child node that we had before, we pass "file={child-node}" or "backing={child-node}" to it. Therefore, when .bdrv_open() has returned success, we can assume that bs->file or bs->backing (respectively) points to our original child again. This is verified by an assertion. All of this is not immediately clear from a quick glance at the code, so add a comment to the assertion what it is for, and why it is valid. It certainly confused Coverity. Reported-by: Coverity (CID 1452774) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210503095418.31521-1-mreitz@redhat.com> [mreitz: s/close/detach/] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-18block/nbd: safer transition to receiving requestVladimir Sementsov-Ogievskiy
req->receiving is a flag of request being in one concrete yield point in nbd_co_do_receive_one_chunk(). Such kind of boolean flag is always better to unset before scheduling the coroutine, to avoid double scheduling. So, let's be more careful. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-33-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: add nbd_client_connected() helperVladimir Sementsov-Ogievskiy
We already have two similar helpers for other state. Let's add another one for convenience. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-32-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: reuse nbd_co_do_establish_connection() in nbd_open()Vladimir Sementsov-Ogievskiy
The only last step we need to reuse the function is coroutine-wrapper. nbd_open() may be called from non-coroutine context. So, generate the wrapper and use it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-31-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18nbd/client-connection: add option for non-blocking connection attemptVladimir Sementsov-Ogievskiy
We'll need a possibility of non-blocking nbd_co_establish_connection(), so that it returns immediately, and it returns success only if a connections was previously established in background. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-30-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: split nbd_co_do_establish_connection out of nbd_reconnect_attemptVladimir Sementsov-Ogievskiy
Split out the part that we want to reuse for nbd_open(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-29-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18nbd/client-connection: return only one io channelVladimir Sementsov-Ogievskiy
block/nbd doesn't need underlying sioc channel anymore. So, we can update nbd/client-connection interface to return only one top-most io channel, which is more straight forward. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-27-vsementsov@virtuozzo.com> [eblake: squash in Vladimir's fixes for uninit usage caught by clang] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: drop BDRVNBDState::siocVladimir Sementsov-Ogievskiy
Currently sioc pointer is used just to pass from socket-connection to nbd negotiation. Drop the field, and use local variables instead. With next commit we'll update nbd/client-connection.c to behave appropriately (return only top-most ioc, not two channels). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-26-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: don't touch s->sioc in nbd_teardown_connection()Vladimir Sementsov-Ogievskiy
Negotiation during reconnect is now done in a thread, and s->sioc is not available during negotiation. Negotiation in thread will be cancelled by nbd_client_connection_release() called from nbd_clear_bdrvstate(). So, we don't need this code chunk anymore. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-25-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: use negotiation of NBDClientConnectionVladimir Sementsov-Ogievskiy
Now that we can opt in to negotiation as part of the client connection thread, use that to simplify connection_co. This is another step on the way to moving all reconnect code into NBDClientConnection. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-24-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: split nbd_handle_updated_info out of nbd_client_handshake()Vladimir Sementsov-Ogievskiy
To be reused in the following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-23-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18nbd/client-connection: add possibility of negotiationVladimir Sementsov-Ogievskiy
Add arguments and logic to support nbd negotiation in the same thread after successful connection. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-20-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18nbd: move connection code from block/nbd to nbd/client-connectionVladimir Sementsov-Ogievskiy
We now have bs-independent connection API, which consists of four functions: nbd_client_connection_new() nbd_client_connection_release() nbd_co_establish_connection() nbd_co_establish_connection_cancel() Move them to a separate file together with NBDClientConnection structure which becomes private to the new API. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-18-vsementsov@virtuozzo.com> [eblake: comment tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: introduce nbd_client_connection_release()Vladimir Sementsov-Ogievskiy
This is a last step of creating bs-independent nbd connection interface. With next commit we can finally move it to separate file. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-17-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: introduce nbd_client_connection_new()Vladimir Sementsov-Ogievskiy
This is a step of creating bs-independent nbd connection interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-16-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: rename NBDConnectThread to NBDClientConnectionVladimir Sementsov-Ogievskiy
We are going to move the connection code to its own file, and want clear names and APIs first. The structure is shared between user and (possibly) several runs of connect-thread. So it's wrong to call it "thread". Let's rename to something more generic. Appropriately rename connect_thread and thr variables to conn. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-15-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: make nbd_co_establish_connection_cancel() bs-independentVladimir Sementsov-Ogievskiy
nbd_co_establish_connection_cancel() actually needs only pointer to NBDConnectThread. So, make it clean. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-14-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: bs-independent interface for nbd_co_establish_connection()Vladimir Sementsov-Ogievskiy
We are going to split connection code to a separate file. Now we are ready to give nbd_co_establish_connection() clean and bs-independent interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-13-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: drop thr->stateVladimir Sementsov-Ogievskiy
We don't need all these states. The code refactored to use two boolean variables looks simpler. While moving the comment in nbd_co_establish_connection() rework it to give better information. Also, we are going to move the connection code to separate file and mentioning drained section would be confusing. Improve also the comment in NBDConnectThread, while dropping removed state names from it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-12-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: comment tweak] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: simplify waking of nbd_co_establish_connection()Vladimir Sementsov-Ogievskiy
Instead of managing connect_bh, bh_ctx, and wait_connect fields, we can use a single link to the waiting coroutine with proper mutex protection. So new logic is: nbd_co_establish_connection() sets wait_co under the mutex, releases the mutex, then yield()s. Note that wait_co may be scheduled by the thread immediately after unlocking the mutex. Still, the main thread (or iothread) will not reach the code for entering the coroutine until the yield(), so we are safe. connect_thread_func() and nbd_co_establish_connection_cancel() do the following to handle wait_co: Under the mutex, if thr->wait_co is not NULL, make it NULL and schedule it. This way, we avoid scheduling the coroutine twice. Still scheduling is a bit different: In connect_thread_func() we can just call aio_co_wake under mutex, after commit [async: the main AioContext is only "current" if under the BQL] we are sure that aio_co_wake() will not try to acquire the aio context and do qemu_aio_coroutine_enter() but simply schedule the coroutine by aio_co_schedule(). nbd_co_establish_connection_cancel() will be called from non-coroutine context in further patch and will be able to go through qemu_aio_coroutine_enter() path of aio_co_wake(). So keep current behavior of waking the coroutine after the critical section. Also, this commit reduces the dependence of nbd_co_establish_connection() on the internals of bs (we now use a generic pointer to the coroutine, instead of direct use of s->connection_co). This is a step towards splitting the connection API out of nbd.c. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-11-vsementsov@virtuozzo.com> Reviewied-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: BDRVNBDState: drop unused connect_err and connect_statusVladimir Sementsov-Ogievskiy
These fields are write-only. Drop them. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-10-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: nbd_client_handshake(): fix leak of s->iocVladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Message-Id: <20210610100802.5888-9-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: ensure ->connection_thread is always validRoman Kagan
Simplify lifetime management of BDRVNBDState->connect_thread by delaying the possible cleanup of it until the BDRVNBDState itself goes away. This also reverts 0267101af6 "block/nbd: fix possible use after free of s->connect_thread" as now s->connect_thread can't be cleared until the very end. Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> [vsementsov: rebase, revert 0267101af6 changes] Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: tweak comment] Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-8-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: call socket_address_parse_named_fd() in advanceVladimir Sementsov-Ogievskiy
Detecting monitor by current coroutine works bad when we are not in coroutine context. And that's exactly so in nbd reconnect code, where qio_channel_socket_connect_sync() is called from thread. Monitor is needed only to parse named file descriptor. So, let's just parse it during nbd_open(), so that all further users of s->saddr don't need to access monitor. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-7-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: connect_thread_func(): do qio_channel_set_delay(false)Vladimir Sementsov-Ogievskiy
nbd_open() does it (through nbd_establish_connection()). Actually we lost that call on reconnect path in 1dc4718d849e1a1fe "block/nbd: use non-blocking connect: fix vm hang on connect()" when we have introduced reconnect thread. Fixes: 1dc4718d849e1a1fe665ce5241ed79048cfa2cfc Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: fix how state is cleared on nbd_open() failure pathsVladimir Sementsov-Ogievskiy
We have two "return error" paths in nbd_open() after nbd_process_options(). Actually we should call nbd_clear_bdrvstate() on these paths. Interesting that nbd_process_options() calls nbd_clear_bdrvstate() by itself. Let's fix leaks and refactor things to be more obvious: - intialize yank at top of nbd_open() - move yank cleanup to nbd_clear_bdrvstate() - refactor nbd_open() so that all failure paths except for yank-register goes through nbd_clear_bdrvstate() Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-4-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-18block/nbd: fix channel object leakRoman Kagan
nbd_free_connect_thread leaks the channel object if it hasn't been stolen. Unref it and fix the leak. Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-06-14block: use GDateTime for formatting timestamp when dumping snapshot infoDaniel P. Berrangé
The GDateTime APIs provided by GLib avoid portability pitfalls, such as some platforms where 'struct timeval.tv_sec' field is still 'long' instead of 'time_t'. When combined with automatic cleanup, GDateTime often results in simpler code too. Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-06-14block: remove duplicate trace.h includeDaniel P. Berrangé
Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-06-14block: add trace point when fdatasync failsDaniel P. Berrangé
A flush failure is a critical failure scenario for some operations. For example, it will prevent migration from completing, as it will make vm_stop() report an error. Thus it is important to have a trace point present for debugging. Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-06-14block: preserve errno from fdatasync failuresDaniel P. Berrangé
When fdatasync() fails on a file backend we set a flag that short-circuits any future attempts to call fdatasync(). The first failure returns the true errno, but the later short- circuited calls return a generic EIO. The latter is unhelpful because fdatasync() can return a variety of errnos, including EACCESS. Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-06-04iscsi: link libm into the modulePaolo Bonzini
Depending on the configuration of QEMU, some binaries might not need libm at all. In that case libiscsi, which uses exp(), will fail to load. Link it in the module explicitly. Reported-by: Yi Sun <yisun@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>