aboutsummaryrefslogtreecommitdiff
path: root/block/nbd-client.c
AgeCommit message (Collapse)Author
2018-07-02nbd/client: Add x-dirty-bitmap to query bitmap from serverEric Blake
In order to test that the NBD server is properly advertising dirty bitmaps, we need a bare minimum client that can request and read the context. Since feature freeze for 3.0 is imminent, this is the smallest workable patch, which replaces the qemu block status report with the results of the NBD server's dirty bitmap (making it very easy to use 'qemu-img map --output=json' to learn where the dirty portions are). Note that the NBD protocol defines a dirty section with the same bit but opposite sense that normal "base:allocation" uses to report an allocated section; so in qemu-img map output, "data":true corresponds to clean, "data":false corresponds to dirty. A more complete solution that allows dirty bitmaps to be queried at the same time as normal block status will be required before this addition can lose the x- prefix. Until then, the fact that this replaces normal status with dirty status means actions like 'qemu-img convert' will likely misbehave due to treating dirty regions of the file as if they are unallocated. The next patch adds an iotest to exercise this new code. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180702191458.28741-2-eblake@redhat.com>
2018-05-04nbd/client: Relax handling of large NBD_CMD_BLOCK_STATUS replyEric Blake
The NBD spec is proposing a relaxation of NBD_CMD_BLOCK_STATUS where a server may have the final extent per context give a length beyond the original request, if it can easily prove that subsequent bytes have the same status, on the grounds that a client can take advantage of this information for fewer block status requests. Since qemu 2.12 as a client always sends NBD_CMD_FLAG_REQ_ONE, and rejects a server that sends extra length, the upstream NBD spec will probably limit this behavior to clients that don't request REQ_ONE semantics; but it doesn't hurt to relax qemu to always be permissive of this server behavior, even if it continues to use REQ_ONE. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180503222626.1303410-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2018-04-02nbd: Fix 32-bit compilation on BLOCK_STATUSEric Blake
iotests 123 and 209 fail on 32-bit platforms. The culprit: sizeof(extent) is wrong; we want sizeof(*extent). But since the struct is 8 bytes, it happened to work on 64-bit platforms where the pointer is also 8 bytes (nasty). Fixes: 78a33ab58 Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180327210517.1804242-1-eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2018-03-13nbd: BLOCK_STATUS for standard get_block_status function: client partVladimir Sementsov-Ogievskiy
Minimal realization: only one extent in server answer is supported. Flag NBD_CMD_FLAG_REQ_ONE is used to force this behavior. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180312152126.286890-6-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweaks, fix min_block check and 32-bit cap, use -1 instead of errno on failure in nbd_negotiate_simple_meta_context, ensure that block status makes progress on success] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-13block/nbd-client: save first fatal error in nbd_iter_errorVladimir Sementsov-Ogievskiy
It is ok, that fatal error hides previous not fatal, but hiding first fatal error is a bad feature. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180312152126.286890-5-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-01nbd: Honor server's advertised minimum block sizeEric Blake
Commit 79ba8c98 (v2.7) changed the setting of request_alignment to occur only during bdrv_refresh_limits(), rather than at at bdrv_open() time; but at the time, NBD was unaffected, because it still used sector-based callbacks, so the block layer defaulted NBD to use 512 request_alignment. Later, commit 70c4fb26 (also v2.7) changed NBD to use byte-based callbacks, without setting request_alignment. This resulted in NBD using request_alignment of 1, which works great when the server supports it (as is the case for qemu-nbd), but falls apart miserably if the server requires alignment (but only if qemu actually sends a sub-sector request; qemu-io can do it, but most qemu operations still perform on sectors or larger). Even later, the NBD protocol was updated to document that clients should learn the server's minimum alignment during NBD_OPT_GO; and recommended that clients should assume a minimum size of 512 unless the server understands NBD_OPT_GO and replied with a smaller size. Commit 081dd1fe (v2.10) attempted to do that, by assigning request_alignment to whatever was learned from the server; but it has two flaws: the assignment is done during bdrv_open() so it gets unconditionally wiped out back to 1 during any later bdrv_refresh_limits(); and the code is not using a default of 512 when the server did not report a minimum size. Fix these issues by moving the assignment to request_alignment to the right function, and by using a sane default when the server does not advertise a minimum size. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180215032905.27146-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
2017-11-17nbd: Don't crash when server reports NBD_CMD_READ failureEric Blake
If a server fails a read, for example with EIO, but the connection is still live, then we would crash trying to print a non-existent error message in nbd_client_co_preadv(). For consistency, also change the error printout in nbd_read_reply_entry(), although that instance does not crash. Bug introduced in commit f140e300. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171112013936.5942-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-09nbd-client: Stricter enforcing of structured reply specEric Blake
Ensure that the server is not sending unexpected chunk lengths for either the NONE or the OFFSET_DATA chunk, nor unexpected hole length for OFFSET_HOLE. This will flag any server as broken that responds to a zero-length read with an OFFSET_DATA (what our server currently does, but that's about to be fixed) or with OFFSET_HOLE, even though we previously fixed our client to never be able to send such a request over the wire. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171108215703.9295-7-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-09nbd-client: Short-circuit 0-length operationsEric Blake
The NBD spec was recently clarified to state that clients should not send 0-length requests to the server, as the server behavior is undefined [1]. We know that qemu-nbd's behavior is a successful no-op (once it has filtered for read-only exports), but other NBD implementations might return an error. To avoid any questionable server implementations, it is better to just short-circuit such requests on the client side (we are relying on the block layer to already filter out requests such as invalid offset, write to a read-only volume, and so forth); do the short-circuit as late as possible to still benefit from protections from assertions that the block layer is not violating our assumptions. [1] https://github.com/NetworkBlockDevice/nbd/commit/ee926037 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171108215703.9295-6-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-09nbd-client: Refuse read-only client with BDRV_O_RDWREric Blake
The NBD spec says that clients should not try to write/trim to an export advertised as read-only by the server. But we failed to check that, and would allow the block layer to use NBD with BDRV_O_RDWR even when the server is read-only, which meant we were depending on the server sending a proper EPERM failure for various commands, and also exposes a leaky abstraction: using qemu-io in read-write mode would succeed on 'w -z 0 0' because of local short-circuiting logic, but 'w 0 0' would send a request over the wire (where it then depends on the server, and fails at least for qemu-nbd but might pass for other NBD implementations). With this patch, a client MUST request read-only mode to access a server that is doing a read-only export, or else it will get a message like: can't open device nbd://localhost:10809/foo: request for write access conflicts with read-only export It is no longer possible to even attempt writes over the wire (including the corner case of 0-length writes), because the block layer enforces the explicit read-only request; this matches the behavior of qcow2 when backed by a read-only POSIX file. Fix several iotests to comply with the new behavior (since qemu-nbd of an internal snapshot, as well as nbd-server-add over QMP, default to a read-only export, we must tell blockdev-add/qemu-io to set up a read-only client). CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171108215703.9295-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-11-09nbd-client: Fix error message typosEric Blake
Provide missing spaces that are required when using string concatenation to break error messages across source lines. Introduced in commit f140e300. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171108215703.9295-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2017-10-30nbd: Minimal structured read for clientVladimir Sementsov-Ogievskiy
Minimal implementation: for structured error only error_report error message. Note that test 83 is now more verbose, because the implementation prints more warnings about unexpected communication errors; perhaps future patches should tone things down by using trace messages instead of traces, but the common case of successful communication is no noisier than before. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171027104037.8319-13-eblake@redhat.com>
2017-10-30nbd/client: prepare nbd_receive_reply for structured replyVladimir Sementsov-Ogievskiy
In following patch nbd_receive_reply will be used both for simple and structured reply header receiving. NBDReply is altered into union of simple reply header and structured reply chunk header, simple error translation moved to block/nbd-client to be consistent with further structured reply error translation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20171027104037.8319-11-eblake@redhat.com>
2017-10-12block/nbd-client: refactor nbd_co_receive_replyVladimir Sementsov-Ogievskiy
Pass handle parameter directly, as the whole request isn't needed. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20171012095319.136610-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-10-12block/nbd-client: assert qiov len once in nbd_co_requestVladimir Sementsov-Ogievskiy
Also improve the assertion: check that qiov is NULL for other commands than CMD_READ and CMD_WRITE. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20171012095319.136610-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-09-25block/nbd-client: nbd_co_send_request: fix return codeVladimir Sementsov-Ogievskiy
It's incorrect to return success rc >= 0 if we skip qio_channel_writev_all() call due to s->quit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20170920124507.18841-4-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-09-25block/nbd-client: simplify check in nbd_co_receive_replyVladimir Sementsov-Ogievskiy
If we are woken up from while() loop in nbd_read_reply_entry handles must be equal. If we are woken up from nbd_recv_coroutines_wake_all s->quit must be true, so we do not need checking handles equality. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20170920124507.18841-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-09-25block/nbd-client: refactor nbd_co_receive_replyVladimir Sementsov-Ogievskiy
"NBDReply *reply" parameter of nbd_co_receive_reply is used only to pass return value for nbd_co_request (reply.error). Remove it and use function return value instead. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20170920124507.18841-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-09-25nbd-client: Use correct macro parenthesizationEric Blake
If 'bs' is a complex expression, we were only casting the front half rather than the full expression. Luckily, none of the callers were passing bad arguments, but it's better to be robust up front. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170918214649.17550-1-eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-09-06nbd: Use new qio_channel_*_all() functionsEric Blake
Rather than open-coding our own read/write-all functions, we can make use of the recently-added qio code. It slightly changes the error message in one of the iotests. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170905191114.5959-4-eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
2017-08-30block/nbd-client: refactor request send/receiveVladimir Sementsov-Ogievskiy
Add nbd_co_request, to remove code duplications in nbd_client_co_{pwrite,pread,...} functions. Also this is needed for further refactoring. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170804151440.320927-8-vsementsov@virtuozzo.com> [eblake: make nbd_co_request a wrapper, rather than merging two existing functions] Signed-off-by: Eric Blake <eblake@redhat.com>
2017-08-30block/nbd-client: rename nbd_recv_coroutines_enter_allVladimir Sementsov-Ogievskiy
Rename nbd_recv_coroutines_enter_all to nbd_recv_coroutines_wake_all, as it most probably just adds all recv coroutines into co_queue_wakeup, rather than directly enter them. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170804151440.320927-9-vsementsov@virtuozzo.com> [eblake: tweak commit message] Signed-off-by: Eric Blake <eblake@redhat.com>
2017-08-30block/nbd-client: get rid of ssize_tVladimir Sementsov-Ogievskiy
Use int variable for nbd_co_send_request return value (as nbd_co_send_request returns int). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170804151440.320927-6-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-08-30nbd-client: avoid read_reply_co entry if send failedStefan Hajnoczi
The following segfault is encountered if the NBD server closes the UNIX domain socket immediately after negotiation: Program terminated with signal SIGSEGV, Segmentation fault. #0 aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441 441 QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines, (gdb) bt #0 0x000000d3c01a50f8 in aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441 #1 0x000000d3c012fa90 in nbd_coroutine_end (bs=bs@entry=0xd3c0fec650, request=<optimized out>) at block/nbd-client.c:207 #2 0x000000d3c012fb58 in nbd_client_co_preadv (bs=0xd3c0fec650, offset=0, bytes=<optimized out>, qiov=0x7ffc10a91b20, flags=0) at block/nbd-client.c:237 #3 0x000000d3c0128e63 in bdrv_driver_preadv (bs=bs@entry=0xd3c0fec650, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=0) at block/io.c:836 #4 0x000000d3c012c3e0 in bdrv_aligned_preadv (child=child@entry=0xd3c0ff51d0, req=req@entry=0x7f31885d6e90, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=1, qiov=qiov@entry=0x7ffc10a91b20, f +lags=0) at block/io.c:1086 #5 0x000000d3c012c6b8 in bdrv_co_preadv (child=0xd3c0ff51d0, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=flags@entry=0) at block/io.c:1182 #6 0x000000d3c011cc17 in blk_co_preadv (blk=0xd3c0ff4f80, offset=0, bytes=512, qiov=0x7ffc10a91b20, flags=0) at block/block-backend.c:1032 #7 0x000000d3c011ccec in blk_read_entry (opaque=0x7ffc10a91b40) at block/block-backend.c:1079 #8 0x000000d3c01bbb96 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79 #9 0x00007f3196cb8600 in __start_context () at /lib64/libc.so.6 The problem is that nbd_client_init() uses nbd_client_attach_aio_context() -> aio_co_schedule(new_context, client->read_reply_co). Execution of read_reply_co is deferred to a BH which doesn't run until later. In the mean time blk_co_preadv() can be called and nbd_coroutine_end() calls aio_wake() on read_reply_co. At this point in time read_reply_co's ctx isn't set because it has never been entered yet. This patch simplifies the nbd_co_send_request() -> nbd_co_receive_reply() -> nbd_coroutine_end() lifecycle to just nbd_co_send_request() -> nbd_co_receive_reply(). The request is "ended" if an error occurs at any point. Callers no longer have to invoke nbd_coroutine_end(). This cleanup also eliminates the segfault because we don't call aio_co_schedule() to wake up s->read_reply_co if sending the request failed. It is only necessary to wake up s->read_reply_co if a reply was received. Note this only happens with UNIX domain sockets on Linux. It doesn't seem possible to reproduce this with TCP sockets. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170829122745.14309-2-stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-08-23nbd-client: avoid spurious qio_channel_yield() re-entryStefan Hajnoczi
The following scenario leads to an assertion failure in qio_channel_yield(): 1. Request coroutine calls qio_channel_yield() successfully when sending would block on the socket. It is now yielded. 2. nbd_read_reply_entry() calls nbd_recv_coroutines_enter_all() because nbd_receive_reply() failed. 3. Request coroutine is entered and returns from qio_channel_yield(). Note that the socket fd handler has not fired yet so ioc->write_coroutine is still set. 4. Request coroutine attempts to send the request body with nbd_rwv() but the socket would still block. qio_channel_yield() is called again and assert(!ioc->write_coroutine) is hit. The problem is that nbd_read_reply_entry() does not distinguish between request coroutines that are waiting to receive a reply and those that are not. This patch adds a per-request bool receiving flag so nbd_read_reply_entry() can avoid spurious aio_wake() calls. Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170822125113.5025-1-stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-08-23fix build failure in nbd_read_reply_entry()Igor Mammedov
travis builds fail at HEAD at rc3 master with block/nbd-client.c: In function ‘nbd_read_reply_entry’: block/nbd-client.c:110:8: error: ‘ret’ may be used uninitialized in this function [-Werror=uninitialized] fix it by initializing 'ret' to 0 Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-08-15nbd-client: Fix regression when server sends garbageEric Blake
When we switched NBD to use coroutines for qemu 2.9 (in particular, commit a12a712a), we introduced a regression: if a server sends us garbage (such as a corrupted magic number), we quit the read loop but do not stop sending further queued commands, resulting in the client hanging when it never reads the response to those additional commands. In qemu 2.8, we properly detected that the server is no longer reliable, and cancelled all existing pending commands with EIO, then tore down the socket so that all further command attempts get EPIPE. Restore the proper behavior of quitting (almost) all communication with a broken server: Once we know we are out of sync or otherwise can't trust the server, we must assume that any further incoming data is unreliable and therefore end all pending commands with EIO, and quit trying to send any further commands. As an exception, we still (try to) send NBD_CMD_DISC to let the server know we are going away (in part, because it is easier to do that than to further refactor nbd_teardown_connection, and in part because it is the only command where we do not have to wait for a reply). Based on a patch by Vladimir Sementsov-Ogievskiy. A malicious server can be created with the following hack, followed by setting NBD_SERVER_DEBUG to a non-zero value in the environment when running qemu-nbd: | --- a/nbd/server.c | +++ b/nbd/server.c | @@ -919,6 +919,17 @@ static int nbd_send_reply(QIOChannel *ioc, NBDReply *reply, Error **errp) | stl_be_p(buf + 4, reply->error); | stq_be_p(buf + 8, reply->handle); | | + static int debug; | + static int count; | + if (!count++) { | + const char *str = getenv("NBD_SERVER_DEBUG"); | + if (str) { | + debug = atoi(str); | + } | + } | + if (debug && !(count % debug)) { | + buf[0] = 0; | + } | return nbd_write(ioc, buf, sizeof(buf), errp); | } Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170814213426.24681-1-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-07-14nbd: Implement NBD_INFO_BLOCK_SIZE on clientEric Blake
The upstream NBD Protocol has defined a new extension to allow the server to advertise block sizes to the client, as well as a way for the client to inform the server whether it intends to obey block sizes. When using the block layer as the client, we will obey block sizes; but when used as 'qemu-nbd -c' to hand off to the kernel nbd module as the client, we are still waiting for the kernel to implement a way for us to learn if it will honor block sizes (perhaps by an addition to sysfs, rather than an ioctl), as well as any way to tell the kernel what additional block sizes to obey (NBD_SET_BLKSIZE appears to be accurate for the minimum size, but preferred and maximum sizes would probably be new ioctl()s), so until then, we need to make our request for block sizes conditional. When using ioctl(NBD_SET_BLKSIZE) to hand off to the kernel, use the minimum block size as the sector size if it is larger than 512, which also has the nice effect of cooperating with (non-qemu) servers that don't do read-modify-write when exposing a block device with 4k sectors; it might also allow us to visit a file larger than 2T on a 32-bit kernel. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170707203049.534-10-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-14nbd: Create struct for tracking export infoEric Blake
The NBD Protocol is introducing some additional information about exports, such as minimum request size and alignment, as well as an advertised maximum request size. It will be easier to feed this information back to the block layer if we gather all the information into a struct, rather than adding yet more pointer parameters during negotiation. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170707203049.534-2-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-04nbd: fix NBD over TLSPaolo Bonzini
When attaching the NBD QIOChannel to an AioContext, the TLS channel should be used, not the underlying socket channel. This is because, trivially, the TLS channel will be the one that we read/write to and thus the one that will get the qio_channel_yield() call. Fixes: ff82911cd3f69f028f2537825c9720ff78bc3f19 Cc: qemu-stable@nongnu.org Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Tested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-26block: change variable names in BlockDriverStateManos Pitsidianakis
Change the 'int count' parameter in *pwrite_zeros, *pdiscard related functions (and some others) to 'int bytes', as they both refer to bytes. This helps with code legibility. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-06-15nbd: rename read_sync and friendsVladimir Sementsov-Ogievskiy
Rename nbd_wr_syncv -> nbd_rwv read_sync -> nbd_read read_sync_eof -> nbd_read_eof write_sync -> nbd_write drop_sync -> nbd_drop 1. nbd_ prefix read_sync and write_sync are already shared, so it is good to have a namespace prefix. drop_sync will be shared, and read_sync_eof is related to read_sync, so let's rename them all. 2. _sync suffix _sync is related to the fact that nbd_wr_syncv doesn't return if a write to socket returns EAGAIN. The first implementation of nbd_wr_syncv (was wr_sync in 7a5ca8648b) just loops while getting EAGAIN, the current implementation yields in this case. Why we want to get rid of it: - it is normal for r/w functions to be synchronous, so having an additional suffix for it looks redundant (contrariwise, we have _aio suffix for async functions) - _sync suffix in block layer is used when function does flush (so using it for other thing is confusing a bit) - keep function names short after adding nbd_ prefix 3. for nbd_wr_syncv let's use more common notation 'rw' Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170602150150.258222-2-vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-07nbd: make it thread-safe, fix qcow2 over nbdPaolo Bonzini
NBD is not thread safe, because it accesses s->in_flight without a CoMutex. Fixing this will be required for multiqueue. CoQueue doesn't have spurious wakeups but, when another coroutine can run between qemu_co_queue_next's wakeup and qemu_co_queue_wait's re-locking of the mutex, the wait condition can become false and a loop is necessary. In fact, it turns out that the loop is necessary even without this multi-threaded scenario. A particular sequence of coroutine wakeups is happening ~80% of the time when starting a guest with qcow2 image served over NBD (i.e. qemu-nbd --format=raw, and QEMU's -drive option has -format=qcow2). This patch fixes that issue too. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-06nbd/client.c: use errp instead of LOGVladimir Sementsov-Ogievskiy
Move to modern errp scheme from just LOGging errors. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170526110913.89098-1-vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-06nbd: add errp parameter to nbd_wr_syncv()Vladimir Sementsov-Ogievskiy
Will be used in following patch to provide actual error message in some cases. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20170516094533.6160-4-vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-27nbd-client: fix handling of hungup connectionsPaolo Bonzini
After the switch to reading replies in a coroutine, nothing is reentering pending receive coroutines if the connection hangs. Move nbd_recv_coroutines_enter_all to the reply read coroutine, which is the place where hangups are detected. nbd_teardown_connection can simply wait for the reply read coroutine to detect the hangup and clean up after itself. This wouldn't be enough though because nbd_receive_reply returns 0 (rather than -EPIPE or similar) when reading from a hung connection. Fix the return value check in nbd_read_reply_entry. This fixes qemu-iotests 083. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170314111157.14464-1-pbonzini@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-02-21coroutine-lock: add mutex argument to CoQueue APIsPaolo Bonzini
All that CoQueue needs in order to become thread-safe is help from an external mutex. Add this to the API. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213181244.16297-6-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21nbd: convert to use qio_channel_yieldPaolo Bonzini
In the client, read the reply headers from a coroutine, switching the read side between the "read header" coroutine and the I/O coroutine that reads the body of the reply. In the server, if the server can read more requests it will create a new "read request" coroutine as soon as a request has been read. Otherwise, the new coroutine is created in nbd_request_put. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-8-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-03aio: add AioPollFn and io_poll() interfaceStefan Hajnoczi
The new AioPollFn io_poll() argument to aio_set_fd_handler() and aio_set_event_handler() is used in the next patch. Keep this code change separate due to the number of files it touches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-11-22nbd: Allow unmap and fua during write zeroesEric Blake
Commit fa778fff wired up support to send the NBD_CMD_WRITE_ZEROES, but forgot to inform the block layer that FUA unmapping of zeroes is supported. Without BDRV_REQ_MAY_UNMAP listed as a supported flag, the block layer will always insist on the NBD layer passing NBD_CMD_FLAG_NO_HOLE, resulting in the server always allocating things even when it was desired to let the server punch holes. Similarly, failing to set BDRV_REQ_FUA means that the client may send unnecessary NBD_CMD_FLUSH when it could have instead used the NBD_CMD_FLAG_FUA bit. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1479413642-22463-2-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02nbd: Implement NBD_CMD_WRITE_ZEROES on clientEric Blake
Upstream NBD protocol recently added the ability to efficiently write zeroes without having to send the zeroes over the wire, along with a flag to control whether the client wants a hole. The generic block code takes care of falling back to the obvious write of lots of zeroes if we return -ENOTSUP because the server does not have WRITE_ZEROES. Ideally, since NBD_CMD_WRITE_ZEROES does not involve any data over the wire, we want to support transactions that are much larger than the normal 32M limit imposed on NBD_CMD_WRITE. But the server may still have a limit smaller than UINT_MAX, so until experimental NBD protocol additions for advertising various command sizes is finalized (see [1], [2]), for now we just stick to the same limits as normal writes. [1] https://github.com/yoe/nbd/blob/extension-info/doc/proto.md [2] https://sourceforge.net/p/nbd/mailman/message/35081223/ Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-17-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02nbd: Rename struct nbd_request and nbd_replyEric Blake
Our coding convention prefers CamelCase names, and we already have other existing structs with NBDFoo naming. Let's be consistent, before later patches add even more structs. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-6-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02nbd: Rename NbdClientSession to NBDClientSessionEric Blake
It's better to use consistent capitalization of the namespace used for NBD functions; we have more instances of NBD* than Nbd*. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-5-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02nbd: Treat flags vs. command type as separate fieldsEric Blake
Current upstream NBD documents that requests have a 16-bit flags, followed by a 16-bit type integer; although older versions mentioned only a 32-bit field with masking to find flags. Since the protocol is in network order (big-endian over the wire), the ABI is unchanged; but dealing with the flags as a separate field rather than masking will make it easier to add support for upcoming NBD extensions that increase the number of both flags and commands. Improve some comments in nbd.h based on the current upstream NBD protocol (https://github.com/yoe/nbd/blob/master/doc/proto.md), and touch some nearby code to keep checkpatch.pl happy. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1476469998-28592-3-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-01nbd: Use CoQueue for free_sema instead of CoMutexChanglong Xie
NBD is using the CoMutex in a way that wasn't anticipated. For example, if there are N(N=26, MAX_NBD_REQUESTS=16) nbd write requests, so we will invoke nbd_client_co_pwritev N times. ---------------------------------------------------------------------------------------- time request Actions 1 1 in_flight=1, Coroutine=C1 2 2 in_flight=2, Coroutine=C2 ... 15 15 in_flight=15, Coroutine=C15 16 16 in_flight=16, Coroutine=C16, free_sema->holder=C16, mutex->locked=true 17 17 in_flight=16, Coroutine=C17, queue C17 into free_sema->queue 18 18 in_flight=16, Coroutine=C18, queue C18 into free_sema->queue ... 26 N in_flight=16, Coroutine=C26, queue C26 into free_sema->queue ---------------------------------------------------------------------------------------- Once nbd client recieves request No.16' reply, we will re-enter C16. It's ok, because it's equal to 'free_sema->holder'. ---------------------------------------------------------------------------------------- time request Actions 27 16 in_flight=15, Coroutine=C16, free_sema->holder=C16, mutex->locked=false ---------------------------------------------------------------------------------------- Then nbd_coroutine_end invokes qemu_co_mutex_unlock what will pop coroutines from free_sema->queue's head and enter C17. More free_sema->holder is C17 now. ---------------------------------------------------------------------------------------- time request Actions 28 17 in_flight=16, Coroutine=C17, free_sema->holder=C17, mutex->locked=true ---------------------------------------------------------------------------------------- In above scenario, we only recieves request No.16' reply. As time goes by, nbd client will almostly recieves replies from requests 1 to 15 rather than request 17 who owns C17. In this case, we will encounter assert "mutex->holder == self" failed since Kevin's commit 0e438cdc "coroutine: Let CoMutex remember who holds it". For example, if nbd client recieves request No.15' reply, qemu will stop unexpectedly: ---------------------------------------------------------------------------------------- time request Actions 29 15(most case) in_flight=15, Coroutine=C15, free_sema->holder=C17, mutex->locked=false ---------------------------------------------------------------------------------------- Per Paolo's suggestion "The simplest fix is to change it to CoQueue, which is like a condition variable", this patch replaces CoMutex with CoQueue. Cc: Wen Congyang <wency@cn.fujitsu.com> Reported-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Message-Id: <1476267508-19499-1-git-send-email-xiecl.fnst@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-20nbd: Convert to byte-based interfaceEric Blake
The NBD protocol doesn't have any notion of sectors, so it is a fairly easy conversion to use byte-based read and write. Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1468624988-423-19-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20nbd: Switch .bdrv_co_discard() to byte-basedEric Blake
Another step towards killing off sector-based block APIs. While at it, call directly into nbd-client.c instead of having a pointless trivial wrapper in nbd.c. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-14-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20nbd: Drop unused offset parameterEric Blake
Now that NBD relies on the block layer to fragment things, we no longer need to track an offset argument for which fragment of a request we are actually servicing. While at it, use true and false instead of 0 and 1 for a bool parameter. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1468607524-19021-6-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20nbd: Rely on block layer to break up large requestsEric Blake
Now that the block layer will honor max_transfer, we can simplify our code to rely on that guarantee. The readv code can call directly into nbd-client, just as the writev code has done since commit 52a4650. Interestingly enough, while qemu-io 'w 0 40m' splits into a 32M and 8M transaction, 'w -z 0 40m' splits into two 16M and an 8M, because the block layer caps the bounce buffer for writing zeroes at 16M. When we later introduce support for NBD_CMD_WRITE_ZEROES, we can get a full 32M zero write (or larger, if the client and server negotiate that write zeroes can use a larger size than ordinary writes). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1468607524-19021-5-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-13coroutine: move entry argument to qemu_coroutine_createPaolo Bonzini
In practice the entry argument is always known at creation time, and it is confusing that sometimes qemu_coroutine_enter is used with a non-NULL argument to re-enter a coroutine (this happens in block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value at creation time, for consistency with e.g. aio_bh_new. Mostly done with the following semantic patch: @ entry1 @ expression entry, arg, co; @@ - co = qemu_coroutine_create(entry); + co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry2 @ expression entry, arg; identifier co; @@ - Coroutine *co = qemu_coroutine_create(entry); + Coroutine *co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry3 @ expression entry, arg; @@ - qemu_coroutine_enter(qemu_coroutine_create(entry), arg); + qemu_coroutine_enter(qemu_coroutine_create(entry, arg)); @ reentry @ expression co; @@ - qemu_coroutine_enter(co, NULL); + qemu_coroutine_enter(co); except for the aforementioned few places where the semantic patch stumbled (as expected) and for test_co_queue, which would otherwise produce an uninitialized variable warning. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>