aboutsummaryrefslogtreecommitdiff
path: root/nbd
AgeCommit message (Collapse)Author
2019-09-05nbd: Implement server use of NBD FAST_ZEROEric Blake
The server side is fairly straightforward: we can always advertise support for detection of fast zero, and implement it by mapping the request to the block layer BDRV_REQ_NO_FALLBACK. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190823143726.27062-5-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: update iotests 223, 233]
2019-09-05nbd: Prepare for NBD_CMD_FLAG_FAST_ZEROEric Blake
Commit fe0480d6 and friends added BDRV_REQ_NO_FALLBACK as a way to avoid wasting time on a preliminary write-zero request that will later be rewritten by actual data, if it is known that the write-zero request will use a slow fallback; but in doing so, could not optimize for NBD. The NBD specification is now considering an extension that will allow passing on those semantics; this patch updates the new protocol bits and 'qemu-nbd --list' output to recognize the bit, as well as the new errno value possible when using the new flag; while upcoming patches will improve the client to use the feature when present, and the server to advertise support for it. The NBD spec recommends (but not requires) that ENOTSUP be avoided for all but failures of a fast zero (the only time it is mandatory to avoid an ENOTSUP failure is when fast zero is supported but not requested during write zeroes; the questionable use is for ENOTSUP to other actions like a normal write request). However, clients that get an unexpected ENOTSUP will either already be treating it the same as EINVAL, or may appreciate the extra bit of information. We were equally loose for returning EOVERFLOW in more situations than recommended by the spec, so if it turns out to be a problem in practice, a later patch can tighten handling for both error codes. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190823143726.27062-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: tweak commit message, also handle EOPNOTSUPP]
2019-09-05nbd: Improve per-export flag handling in serverEric Blake
When creating a read-only image, we are still advertising support for TRIM and WRITE_ZEROES to the client, even though the client should not be issuing those commands. But seeing this requires looking across multiple functions: All callers to nbd_export_new() passed a single flag based solely on whether the export allows writes. Later, we then pass a constant set of flags to nbd_negotiate_options() (namely, the set of flags which we always support, at least for writable images), which is then further dynamically modified with NBD_FLAG_SEND_DF based on client requests for structured options. Finally, when processing NBD_OPT_EXPORT_NAME or NBD_OPT_EXPORT_GO we bitwise-or the original caller's flag with the runtime set of flags we've built up over several functions. Let's refactor things to instead compute a baseline of flags as soon as possible which gets shared between multiple clients, in nbd_export_new(), and changing the signature for the callers to pass in a simpler bool rather than having to figure out flags. We can then get rid of the 'myflags' parameter to various functions, and instead refer to client for everything we need (we still have to perform a bitwise-OR for NBD_FLAG_SEND_DF during NBD_OPT_EXPORT_NAME and NBD_OPT_EXPORT_GO, but it's easier to see what is being computed). This lets us quit advertising senseless flags for read-only images, as well as making the next patch for exposing FAST_ZERO support easier to write. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190823143726.27062-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: improve commit message, update iotest 223]
2019-09-05nbd: Tolerate more errors to structured reply requestEric Blake
A server may have a reason to reject a request for structured replies, beyond just not recognizing them as a valid request; similarly, it may have a reason for rejecting a request for a meta context. It doesn't hurt us to continue talking to such a server; otherwise 'qemu-nbd --list' of such a server fails to display all available details about the export. Encountered when temporarily tweaking nbdkit to reply with NBD_REP_ERR_POLICY. Present since structured reply support was first added (commit d795299b reused starttls handling, but starttls is different in that we can't fall back to other behavior on any error). Note that for an unencrypted client trying to connect to a server that requires encryption, this defers the point of failure to when we finally execute a strict command (such as NBD_OPT_GO or NBD_OPT_LIST), now that the intermediate NBD_OPT_STRUCTURED_REPLY does not diagnose NBD_REP_ERR_TLS_REQD as fatal; but as the protocol eventually gets us to a command where we can't continue onwards, the changed error message doesn't cause any security concerns. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190824172813.29720-3-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [eblake: fix iotest 233]
2019-09-05nbd: Use g_autofree in a few placesEric Blake
Thanks to our recent move to use glib's g_autofree, I can join the bandwagon. Getting rid of gotos is fun ;) There are probably more places where we could register cleanup functions and get rid of more gotos; this patch just focuses on the labels that existed merely to call g_free. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190824172813.29720-2-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-09-05nbd: Advertise multi-conn for shared read-only connectionsEric Blake
The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be advertised when the server promises cache consistency between simultaneous clients (basically, rules that determine what FUA and flush from one client are able to guarantee for reads from another client). When we don't permit simultaneous clients (such as qemu-nbd without -e), the bit makes no sense; and for writable images, we probably have a lot more work before we can declare that actions from one client are cache-consistent with actions from another. But for read-only images, where flush isn't changing any data, we might as well advertise multi-conn support. What's more, advertisement of the bit makes it easier for clients to determine if 'qemu-nbd -e' was in use, where a second connection will succeed rather than hang until the first client goes away. This patch affects qemu as server in advertising the bit. We may want to consider patches to qemu as client to attempt parallel connections for higher throughput by spreading the load over those connections when a server advertises multi-conn, but for now sticking to one connection per nbd:// BDS is okay. See also: https://bugzilla.redhat.com/1708300 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190815185024.7010-1-eblake@redhat.com> [eblake: tweak blockdev-nbd.c to not request shared when writable, fix iotest 233] Reviewed-by: John Snow <jsnow@redhat.com>
2019-08-16block/dirty-bitmap: add bdrv_dirty_bitmap_getJohn Snow
Add a public interface for get. While we're at it, rename "bdrv_get_dirty_bitmap_locked" to "bdrv_dirty_bitmap_get_locked". (There are more functions to rename to the bdrv_dirty_bitmap_VERB form, but they will wait until the conclusion of this series.) Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-11-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-08-16Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-08-15' into ↵Peter Maydell
staging nbd patches for 2019-08-15 - Addition of InetSocketAddress keep-alive - Addition of BDRV_REQ_PREFETCH for more efficient copy-on-read - Initial refactoring in preparation of NBD reconnect # gpg: Signature made Thu 15 Aug 2019 19:28:41 BST # gpg: using RSA key A7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2019-08-15: block/nbd: refactor nbd connection parameters block/nbd: add cmdline and qapi parameter reconnect-delay block/nbd: move from quit to state block/nbd: use non-blocking io channel for nbd negotiation block/nbd: split connection_co start out of nbd_client_connect nbd: improve CMD_CACHE: use BDRV_REQ_PREFETCH block/stream: use BDRV_REQ_PREFETCH block: implement BDRV_REQ_PREFETCH qapi: Add InetSocketAddress member keep-alive Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-16Include qemu/main-loop.h lessMarkus Armbruster
In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16Include qemu/queue.h slightly lessMarkus Armbruster
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-20-armbru@redhat.com>
2019-08-15block/nbd: use non-blocking io channel for nbd negotiationVladimir Sementsov-Ogievskiy
No reason to use blocking channel for negotiation and we'll benefit in further reconnect feature, as qio_channel reads and writes will do qemu_coroutine_yield while waiting for io completion. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190618114328.55249-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2019-08-15nbd: improve CMD_CACHE: use BDRV_REQ_PREFETCHVladimir Sementsov-Ogievskiy
This helps to avoid extra io, allocations and memory copying. We assume here that CMD_CACHE is always used with copy-on-read, as otherwise it's a noop. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190725100550.33801-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2019-06-13nbd/server: Nicer spelling of max BLOCK_STATUS reply lengthEric Blake
Commit 3d068aff (3.0) introduced NBD_MAX_BITMAP_EXTENTS as a limit on how large we would allow a reply to NBD_CMD_BLOCK_STATUS to grow when it is visiting a qemu:dirty-bitmap: context. Later, commit fb7afc79 (3.1) reused the constant to limit base:allocation context replies, although the name is now less appropriate in that situation. Rename things, and improve the macro to use units.h for better legibility. Then reformat the comment to comply with checkpatch rules added in the meantime. No semantic change. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190510151735.29687-1-eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-06-04block: Add BlockBackend.ctxKevin Wolf
This adds a new parameter to blk_new() which requires its callers to declare from which AioContext this BlockBackend is going to be used (or the locks of which AioContext need to be taken anyway). The given context is only stored and kept up to date when changing AioContexts. Actually applying the stored AioContext to the root node is saved for another commit. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04nbd-server: Call blk_set_allow_aio_context_change()Kevin Wolf
The NBD server uses an AioContext notifier, so it can tolerate that its BlockBackend is switched to a different AioContext. Before we start actually calling bdrv_try_set_aio_context(), which checks for consistency, outside of test cases, we need to make sure that the NBD server actually allows this. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2019-04-08nbd/client: Fix error message for server with unusable sizingEric Blake
Add a missing space to the error message used when giving up on a server that insists on an alignment which renders the last few bytes of the export unreadable. Fixes: 3add3ab78 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190404145226.32649-1-eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2019-04-08nbd/server: Don't fail NBD_OPT_INFO for byte-aligned sourcesEric Blake
In commit 0c1d50bd, I added a couple of TODO comments about whether we consult bl.request_alignment when responding to NBD_OPT_INFO. At the time, qemu as server was hard-coding an advertised alignment of 512 to clients that promised to obey constraints, and there was no function for getting at a device's preferred alignment. But in hindsight, advertising 512 when the block device prefers 1 caused other compliance problems, and commit b0245d64 changed one of the two TODO comments to advertise a more accurate alignment. Time to fix the other TODO. Doesn't really impact qemu as client (our normal client doesn't use NBD_OPT_INFO, and qemu-nbd --list promises to obey block sizes), but it might prove useful to other clients. Fixes: b0245d64 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190403030526.12258-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-04-08nbd/server: Trace client noncompliance on unaligned requestsEric Blake
We've recently added traces for clients to flag server non-compliance; let's do the same for servers to flag client non-compliance. According to the spec, if the client requests NBD_INFO_BLOCK_SIZE, it is promising to send all requests aligned to those boundaries. Of course, if the client does not request NBD_INFO_BLOCK_SIZE, then it made no promises so we shouldn't flag anything; and because we are willing to handle clients that made no promises (the spec allows us to use NBD_REP_ERR_BLOCK_SIZE_REQD if we had been unwilling), we already have to handle unaligned requests (which the block layer already does on our behalf). So even though the spec allows us to return EINVAL for clients that promised to behave, it's easier to always answer unaligned requests. Still, flagging non-compliance can be useful in debugging a client that is trying to be maximally portable. Qemu as client used to have one spot where it sent non-compliant requests: if the server sends an unaligned reply to NBD_CMD_BLOCK_STATUS, and the client was iterating over the entire disk, the next request would start at that unaligned point; this was fixed in commit a39286dd when the client was taught to work around server non-compliance; but is equally fixed if the server is patched to not send unaligned replies in the first place (yes, qemu 4.0 as server still has few such bugs, although they will be patched in 4.1). Fortunately, I did not find any more spots where qemu as client was non-compliant. I was able to test the patch by using the following hack to convince qemu-io to run various unaligned commands, coupled with serving 512-byte alignment by intentionally omitting '-f raw' on the server while viewing server traces. | diff --git i/nbd/client.c w/nbd/client.c | index 427980bdd22..1858b2aac35 100644 | --- i/nbd/client.c | +++ w/nbd/client.c | @@ -449,6 +449,7 @@ static int nbd_opt_info_or_go(QIOChannel *ioc, uint32_t opt, | nbd_send_opt_abort(ioc); | return -1; | } | + info->min_block = 1;//hack | if (!is_power_of_2(info->min_block)) { | error_setg(errp, "server minimum block size %" PRIu32 | " is not a power of two", info->min_block); Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190403030526.12258-3-eblake@redhat.com> [eblake: address minor review nits] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-04-08nbd/server: Fix blockstatus traceEric Blake
Don't increment remaining_bytes until we know that we will actually be including the current block status extent in the reply; otherwise, the value traced will include a bytes value that is oversized by the length of the next block status extent which did not get sent because it instead ended the loop. Fixes: fb7afc79 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190403030526.12258-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-04-01nbd/server: Advertise actual minimum block sizeEric Blake
Both NBD_CMD_BLOCK_STATUS and structured NBD_CMD_READ will split their reply according to bdrv_block_status() boundaries. If the block device has a request_alignment smaller than 512, but we advertise a block alignment of 512 to the client, then this can result in the server reply violating client expectations by reporting a smaller region of the export than what the client is permitted to address (although this is less of an issue for qemu 4.0 clients, given recent client patches to overlook our non-compliance at EOF). Since it's always better to be strict in what we send, it is worth advertising the actual minimum block limit rather than blindly rounding it up to 512. Note that this patch is not foolproof - it is still possible to provoke non-compliant server behavior using: $ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file That is arguably a bug in the blkdebug driver (it should never pass back block status smaller than its alignment, even if it has to make multiple bdrv_get_status calls and determine the least-common-denominator status among the group to return). It may also be possible to observe issues with a backing layer with smaller alignment than the active layer, although so far I have been unable to write a reliable iotest for that scenario (but again, an issue like that could be argued to be a bug in the block layer, or something where we need a flag to bdrv_block_status() to state whether the result must be aligned to the current layer's limits or can be subdivided for accuracy when chasing backing files). Anyways, as blkdebug is not normally used, and as this patch makes our server more interoperable with qemu 3.1 clients, it is worth applying now, even while we still work on a larger patch series for the 4.1 timeframe to have byte-accurate file lengths. Note that the iotests output changes - for 223 and 233, we can see the server's better granularity advertisement; and for 241, the three test cases have the following effects: - natural alignment: the server's smaller alignment is now advertised, and the hole reported at EOF is now the right result; we've gotten rid of the server's non-compliance - forced server alignment: the server still advertises 512 bytes, but still sends a mid-sector hole. This is still a server compliance bug, which needs to be fixed in the block layer in a later patch; output does not change because the client is already being tolerant of the non-compliance - forced client alignment: the server's smaller alignment means that the client now sees the server's status change mid-sector without any protocol violations, but the fact that the map shows an unaligned mid-sector hole is evidence of the block layer problems with aligned block status, to be fixed in a later patch Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190329042750.14704-7-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: rebase to enhanced iotest 241 coverage]
2019-04-01nbd/client: Reject inaccessible tail of inconsistent serverEric Blake
The NBD spec suggests that a server should never advertise a size inconsistent with its minimum block alignment, as that tail is effectively inaccessible to a compliant client obeying those block constraints. Since we have a habit of rounding up rather than truncating, to avoid losing the last few bytes of user input, and we cannot access the tail when the server advertises bogus block sizing, abort the connection to alert the server to fix their bug. And rejecting such servers matches what we already did for a min_block that was not a power of 2 or which was larger than max_block. Does not impact either qemu (which always sends properly aligned sizes) or nbdkit (which does not send minimum block requirements yet); so this is mostly aimed at new NBD server implementations, and ensures that the rest of our code can assume the size is aligned. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190330155704.24191-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-03-22trace-events: Delete unused trace pointsMarkus Armbruster
Tracked down with cleanup-trace-events.pl. Funnies requiring manual post-processing: * block.c and blockdev.c trace points are in block/trace-events. * hw/block/nvme.c uses the preprocessor to hide its trace point use from cleanup-trace-events.pl. * include/hw/xen/xen_common.h trace points are in hw/xen/trace-events. * net/colo-compare and net/filter-rewriter.c use pseudo trace points colo_compare_udp_miscompare and colo_filter_rewriter_debug to guard debug code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20190314180929.27722-5-armbru@redhat.com Message-Id: <20190314180929.27722-5-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-22trace-events: Shorten file names in commentsMarkus Armbruster
We spell out sub/dir/ in sub/dir/trace-events' comments pointing to source files. That's because when trace-events got split up, the comments were moved verbatim. Delete the sub/dir/ part from these comments. Gets rid of several misspellings. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190314180929.27722-3-armbru@redhat.com Message-Id: <20190314180929.27722-3-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-22trace-events: Consistently point to docs/devel/tracing.txtMarkus Armbruster
Almost all trace-events point to docs/devel/tracing.txt in a comment right at the beginning. Touch up the ones that don't. [Updated with Markus' new commit description wording. --Stefan] Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190314180929.27722-2-armbru@redhat.com Message-Id: <20190314180929.27722-2-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-12block/dirty-bitmaps: add block_dirty_bitmap_check functionJohn Snow
Instead of checking against busy, inconsistent, or read only directly, use a check function with permissions bits that let us streamline the checks without reproducing them in many places. Included in this patch are permissions changes that simply add the inconsistent check to existing permissions call spots, without addressing existing bugs. In general, this means that busy+readonly checks become BDRV_BITMAP_DEFAULT, which checks against all three conditions. busy-only checks become BDRV_BITMAP_ALLOW_RO. Notably, remove allows inconsistent bitmaps, so it doesn't follow the pattern. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-4-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12block/dirty-bitmaps: unify qmp_locked and user_locked callsJohn Snow
These mean the same thing now. Unify them and rename the merged call bdrv_dirty_bitmap_busy to indicate semantically what we are describing, as well as help disambiguate from the various _locked and _unlocked versions of bitmap helpers that refer to mutex locks. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-8-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12nbd: change error checking order for bitmapsJohn Snow
Check that the bitmap is not in use prior to it checking if it is not enabled/recording guest writes. The bitmap being busy was likely at the behest of the user, so this error has a greater chance of being understood by the user. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-6-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-06qemu-nbd: add support for authorization of TLS clientsDaniel P. Berrange
Currently any client which can complete the TLS handshake is able to use the NBD server. The server admin can turn on the 'verify-peer' option for the x509 creds to require the client to provide a x509 certificate. This means the client will have to acquire a certificate from the CA before they are permitted to use the NBD server. This is still a fairly low bar to cross. This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which takes the ID of a previously added 'QAuthZ' object instance. This will be used to validate the client's x509 distinguished name. Clients failing the authorization check will not be permitted to use the NBD server. For example to setup authorization that only allows connection from a client whose x509 certificate distinguished name is CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB escape the commas in the name and use: qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\ endpoint=server,verify-peer=yes \ --object 'authz-simple,id=auth0,identity=CN=laptop.example.com,,\ O=Example Org,,L=London,,ST=London,,C=GB' \ --tls-creds tls0 \ --tls-authz authz0 \ ....other qemu-nbd args... NB: a real shell command line would not have leading whitespace after the line continuation, it is just included here for clarity. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <20190227162035.18543-2-berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: split long line in --help text, tweak 233 to show that whitespace after ,, in identity= portion is actually okay] Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-25nbd: Use low-level QIOChannel API in nbd_read_eof()Kevin Wolf
Instead of using the convenience wrapper qio_channel_read_all_eof(), use the lower level QIOChannel API. This means duplicating some code, but we'll need this because this coroutine yield is special: We want it to be interruptible so that nbd_client_attach_aio_context() can correctly reenter the coroutine. This moves the bdrv_dec/inc_in_flight() pair into nbd_read_eof(), so that connection_co will always sit in this exact qio_channel_yield() call when bdrv_drain() returns. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2019-02-25nbd: Move nbd_read_eof() to nbd/client.cKevin Wolf
The only caller of nbd_read_eof() is nbd_receive_reply(), so it doesn't have to live in the header file, but can move next to its caller. Also add the missing coroutine_fn to the function and its caller. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2019-02-11nbd/server: Kill pointless shadowed variableEric Blake
lgtm.com pointed out that commit 678ba275 introduced a shadowed declaration of local variable 'bs'; thankfully, the inner 'bs' obtained by 'blk_bs(blk)' matches the outer one given that we had 'blk_insert_bs(blk, bs, errp)' a few lines earlier, and there are no later uses of 'bs' beyond the scope of the 'if (bitmap)' to care if we change the value stored in 'bs' while traveling the backing chain to find a bitmap. So simply get rid of the extra declaration. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190207191357.6665-1-eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04nbd: generalize usage of nbd_readVladimir Sementsov-Ogievskiy
We generally do very similar things around nbd_read: error_prepend specifying what we have tried to read, and be_to_cpu conversion of integers. So, it seems reasonable to move common things to helper functions, which: 1. simplify code a bit 2. generalize nbd_read error descriptions, all starting with "Failed to read" 3. make it more difficult to forget to convert things from BE Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190128165830.165170-1-vsementsov@virtuozzo.com> [eblake: rename macro to DEF_NBD_READ_N and formatting tweaks; checkpatch has false positive complaint] Signed-off-by: Eric Blake <eblake@redhat.com>
2019-01-21nbd/client: Work around 3.0 bug for listing meta contextsEric Blake
Commit 3d068aff forgot to advertise available qemu: contexts when the client requests a list with 0 queries. Furthermore, 3.0 shipped with a qemu-img hack of x-dirty-bitmap (commit 216ee365) that _silently_ acts as though the entire image is clean if a requested bitmap is not present. Both bugs have been recently fixed, so that a modern qemu server gives full context output right away, and the client refuses a connection if a requested x-dirty-bitmap was not found. Still, it is likely that there will be users that have to work with a mix of old and new qemu versions, depending on which features get backported where, at which point being able to rely on 'qemu-img --list' output to know for sure whether a given NBD export has the desired dirty bitmap is much nicer than blindly connecting and risking that the entire image may appear clean. We can make our --list code smart enough to work around buggy servers by tracking whether we've seen any qemu: replies in the original 0-query list; if not, repeat with a single query on "qemu:" (which may still have no replies, but then we know for sure we didn't trip up on the server bug). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190117193658.16413-21-eblake@redhat.com>
2019-01-21nbd/client: Add meta contexts to nbd_receive_export_list()Eric Blake
We want to be able to detect whether a given qemu NBD server is exposing the right export(s) and dirty bitmaps, at least for regression testing. We could use 'nbd-client -l' from the upstream NBD project to list exports, but it's annoying to rely on out-of-tree binaries; furthermore, nbd-client doesn't necessarily know about all of the qemu NBD extensions. Thus, we plan on adding a new mode to qemu-nbd that merely sniffs all possible information from the server during handshake phase, then disconnects and dumps the information. This patch continues the work of the previous patch, by adding the ability to track the list of available meta contexts into NBDExportInfo. It benefits from the recent refactoring patches with a new nbd_list_meta_contexts() that reuses much of the same framework as setting a meta context. Note: a malicious server could exhaust memory of a client by feeding an unending loop of contexts; perhaps we could place a limit on how many we are willing to receive. But this is no different from our earlier analysis on a server sending an unending list of exports, and the death of a client due to memory exhaustion when the client was going to exit soon anyways is not really a denial of service attack. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190117193658.16413-19-eblake@redhat.com>
2019-01-21nbd/client: Add nbd_receive_export_list()Eric Blake
We want to be able to detect whether a given qemu NBD server is exposing the right export(s) and dirty bitmaps, at least for regression testing. We could use 'nbd-client -l' from the upstream NBD project to list exports, but it's annoying to rely on out-of-tree binaries; furthermore, nbd-client doesn't necessarily know about all of the qemu NBD extensions. Thus, we plan on adding a new mode to qemu-nbd that merely sniffs all possible information from the server during handshake phase, then disconnects and dumps the information. This patch adds the low-level client code for grabbing the list of exports. It benefits from the recent refactoring patches, in order to share as much code as possible when it comes to doing validation of server replies. The resulting information is stored in an array of NBDExportInfo which has been expanded to any description string, along with a convenience function for freeing the list. Note: a malicious server could exhaust memory of a client by feeding an unending loop of exports; perhaps we should place a limit on how many we are willing to receive. But note that a server could reasonably be serving an export for every file in a large directory, where an arbitrary limit in the client means we can't list anything from such a server; the same happens if we just run until the client fails to malloc() and thus dies by an abort(), where the limit is no longer arbitrary but determined by available memory. Since the client is already planning on being short-lived, it's hard to call this a denial of service attack that would starve off other uses, so it does not appear to be a security issue. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Message-Id: <20190117193658.16413-18-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-21nbd/client: Refactor nbd_opt_go() to support NBD_OPT_INFOEric Blake
Rename the function to nbd_opt_info_or_go() with an added parameter and slight changes to comments and trace messages, in order to reuse the function for NBD_OPT_INFO. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190117193658.16413-17-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-21nbd/client: Pull out oldstyle size determinationEric Blake
Another refactoring creating nbd_negotiate_finish_oldstyle() for further reuse during 'qemu-nbd --list'. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Message-Id: <20190117193658.16413-16-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-21nbd/client: Split handshake into two functionsEric Blake
An upcoming patch will add the ability for qemu-nbd to list the services provided by an NBD server. Share the common code of the TLS handshake by splitting the initial exchange into a separate function, leaving only the export handling in the original function. Functionally, there should be no change in behavior in this patch, although some of the code motion may be difficult to follow due to indentation changes (view with 'git diff -w' for a smaller changeset). I considered an enum for the return code coordinating state between the two functions, but in the end just settled with ample comments. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190117193658.16413-15-eblake@redhat.com>
2019-01-21nbd/client: Refactor return of nbd_receive_negotiate()Eric Blake
The function could only ever return 0 or -EINVAL; make this clearer by dropping a useless 'fail:' label. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190117193658.16413-14-eblake@redhat.com>
2019-01-21nbd/client: Split out nbd_receive_one_meta_context()Eric Blake
Extract portions of nbd_negotiate_simple_meta_context() to a new function nbd_receive_one_meta_context() that copies the pattern of nbd_receive_list() for performing the argument validation of one reply. The error message when the server replies with more than one context changes slightly, but that shouldn't happen in the common case. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190117193658.16413-13-eblake@redhat.com>
2019-01-21nbd/client: Split out nbd_send_meta_query()Eric Blake
Refactor nbd_negotiate_simple_meta_context() to pull out the code that can be reused to send a LIST request for 0 or 1 query. No semantic change. The old comment about 'sizeof(uint32_t)' being equivalent to '/* number of queries */' is no longer needed, now that we are computing 'sizeof(queries)' instead. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Message-Id: <20190117193658.16413-12-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-21nbd/client: Change signature of nbd_negotiate_simple_meta_context()Eric Blake
Pass 'info' instead of three separate parameters related to info, when requesting the server to set the meta context. Update the NBDExportInfo struct to rename the received id field to match the fact that we are currently overloading the field to match whatever context the user supplied through the x-dirty-bitmap hack, as well as adding a TODO comment to remind future patches about a desire to request two contexts at once. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190117193658.16413-11-eblake@redhat.com>
2019-01-21nbd/client: Move export name into NBDExportInfoEric Blake
Refactor the 'name' parameter of nbd_receive_negotiate() from being a separate parameter into being part of the in-out 'info'. This also spills over to a simplification of nbd_opt_go(). The main driver for this refactoring is that an upcoming patch would like to add support to qemu-nbd to list information about all exports available on a server, where the name(s) will be provided by the server instead of the client. But another benefit is that we can now allow the client to explicitly specify the empty export name "" even when connecting to an oldstyle server (even if qemu is no longer such a server after commit 7f7dfe2a). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190117193658.16413-10-eblake@redhat.com>
2019-01-21nbd/client: Refactor nbd_receive_list()Eric Blake
Right now, nbd_receive_list() is only called by nbd_receive_query_exports(), which in turn is only called if the server lacks NBD_OPT_GO but has working option negotiation, and is merely used as a quality-of-implementation trick since servers can't give decent errors for NBD_OPT_EXPORT_NAME. However, servers that lack NBD_OPT_GO are becoming increasingly rare (nbdkit was a latecomer, in Aug 2018, but qemu has been such a server since commit f37708f6 in July 2017 and released in 2.10), so it no longer makes sense to micro-optimize that function for performance. Furthermore, when debugging a server's implementation, tracing the full reply (both names and descriptions) is useful, not to mention that upcoming patches adding 'qemu-nbd --list' will want to collect that data. And when you consider that a server can send an export name up to the NBD protocol length limit of 4k; but our current NBD_MAX_NAME_SIZE is only 256, we can't trace all valid server names without more storage, but 4k is large enough that the heap is better than the stack for long names. Thus, I'm changing the division of labor, with nbd_receive_list() now always malloc'ing a result on success (the malloc is bounded by the fact that we reject servers with a reply length larger than 32M), and moving the comparison to 'wantname' to the caller. There is a minor change in behavior where a server with 0 exports (an immediate NBD_REP_ACK reply) is now no longer distinguished from a server without LIST support (NBD_REP_ERR_UNSUP); this information could be preserved with a complication to the calling contract to provide a bit more information, but I didn't see the point. After all, the worst that can happen if our guess at a match is wrong is that the caller will get a cryptic disconnect when NBD_OPT_EXPORT_NAME fails (which is no different from what would happen if we had not tried LIST), while treating an empty list as immediate failure would prevent connecting to really old servers that really did lack LIST. Besides, NBD servers with 0 exports are rare (qemu can do it when using QMP nbd-server-start without nbd-server-add - but qemu understands NBD_OPT_GO and thus won't tickle this change in behavior). Fix the spelling of foundExport to match coding standards while in the area. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190117193658.16413-9-eblake@redhat.com>
2019-01-21nbd/server: Favor [u]int64_t over off_tEric Blake
Although our compile-time environment is set up so that we always support long files with 64-bit off_t, we have no guarantee whether off_t is the same type as int64_t. This requires casts when printing values, and prevents us from directly using qemu_strtoi64() (which will be done in the next patch). Let's just flip to uint64_t where possible, and stick to int64_t for detecting failure of blk_getlength(); we also keep the assertions added in the previous patch that the resulting values fit in 63 bits. The overflow check in nbd_co_receive_request() was already sane (request->from is validated to fit in 63 bits, and request->len is 32 bits, so the addition can't overflow 64 bits), but rewrite it in a form easier to recognize as a typical overflow check. Rename the variable 'description' to keep line lengths reasonable. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190117193658.16413-7-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-21nbd/server: Hoist length check to qmp_nbd_server_addEric Blake
We only had two callers to nbd_export_new; qemu-nbd.c always passed a valid offset/length pair (because it already checked the file length, to ensure that offset was in bounds), while blockdev-nbd.c always passed 0/-1. Then nbd_export_new reduces the size to a multiple of BDRV_SECTOR_SIZE (can only happen when offset is not sector-aligned, since bdrv_getlength() currently rounds up) (someday, it would be nice to have byte-accurate lengths - but not today). However, I'm finding it easier to work with the code if we are consistent on having both callers pass in a valid length, and just assert that things are sane in nbd_export_new, meaning that no negative values were passed, and that offset+size does not exceed 63 bits (as that really is a fundamental limit to later operations, whether we use off_t or uint64_t). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190117193658.16413-6-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-15dirty-bitmap: improve bdrv_dirty_bitmap_next_zeroVladimir Sementsov-Ogievskiy
Add bytes parameter to the function, to limit searched range. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-14nbd: Merge nbd_export_bitmap into nbd_export_newEric Blake
We only have one caller that wants to export a bitmap name, which it does right after creation of the export. But there is still a brief window of time where an NBD client could see the export but not the dirty bitmap, which a robust client would have to interpret as meaning the entire image should be treated as dirty. Better is to eliminate the window entirely, by inlining nbd_export_bitmap() into nbd_export_new(), and refusing to create the bitmap in the first place if the requested bitmap can't be located. We also no longer need logic for setting a different bitmap name compared to the bitmap being exported. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190111194720.15671-8-eblake@redhat.com>
2019-01-14nbd: Merge nbd_export_set_name into nbd_export_newEric Blake
The existing NBD code had a weird split where nbd_export_new() created an export but did not add it to the list of exported names until a later nbd_export_set_name() came along and grabbed a second reference on the object; later, the first call to nbd_export_close() drops the second reference while removing the export from the list. This is in part because the QAPI NbdServerRemoveNode enum documents the possibility of adding a mode where we could do a soft disconnect: preventing new clients, but waiting for existing clients to gracefully quit, based on the mode used when calling nbd_export_close(). But in spite of all that, note that we never change the name of an NBD export while it is exposed, which means it is easier to just inline the process of setting the name as part of creating the export. Inline the contents of nbd_export_set_name() and nbd_export_set_description() into the two points in an export lifecycle where they matter, then adjust both callers to pass the name up front. Note that for creation, all callers pass a non-NULL name, (passing NULL at creation was for old style servers, but we removed support for that in commit 7f7dfe2a), so we can add an assert and do things unconditionally; but for cleanup, because of the dual nature of nbd_export_close(), we still have to be careful to avoid use-after-free. Along the way, add a comment reminding ourselves of the potential of adding a middle mode disconnect. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190111194720.15671-5-eblake@redhat.com>
2019-01-14nbd: Only require disabled bitmap for read-only exportsEric Blake
Our initial implementation of x-nbd-server-add-bitmap put in a restriction because of incremental backups: in that usage, we are exporting one qcow2 file (the temporary overlay target of a blockdev-backup sync:none job) and a dirty bitmap owned by a second qcow2 file (the source of the blockdev-backup, which is the backing file of the temporary). While both qcow2 files are still writable (the target in order to capture copy-on-write of old contents, and the source in order to track live guest writes in the meantime), the NBD client expects to see constant data, including the dirty bitmap. An enabled bitmap in the source would be modified by guest writes, which is at odds with the NBD export being a read-only constant view, hence the initial code choice of enforcing a disabled bitmap (the intent is that the exposed bitmap was disabled in the same transaction that started the blockdev-backup job, although we don't want to track enough state to actually enforce that). However, consider the case of a bitmap contained in a read-only node (including when the bitmap is found in a backing layer of the active image). Because the node can't be modified, the bitmap won't change due to writes, regardless of whether it is still enabled. Forbidding the export unless the bitmap is disabled is awkward, paritcularly since we can't change the bitmap to be disabled (because the node is read-only). Alternatively, consider the case of live storage migration, where management directs the destination to create a writable NBD server, then performs a drive-mirror from the source to the target, prior to doing the rest of the live migration. Since storage migration can be time-consuming, it may be wise to let the destination include a dirty bitmap to track which portions it has already received, where even if the migration is interrupted and restarted, the source can query the destination block status in order to potentially minimize re-sending data that has not changed in the meantime on a second attempt. Such code has not been written, and might not be trivial (after all, a cluster being marked dirty in the bitmap does not necessarily guarantee it has the desired contents), but it makes sense that letting an active dirty bitmap be exposed and changing alongside writes may prove useful in the future. Solve both issues by gating the restriction against a disabled bitmap to only happen when the caller has requested a read-only export, and where the BDS that owns the bitmap (whether or not it is the BDS handed to nbd_export_new() or from its backing chain) is still writable. We could drop the check altogether (if management apps are prepared to deal with a changing bitmap even on a read-only image), but for now keeping a check for the read-only case still stands a chance of preventing management errors. Update iotest 223 to show the looser behavior by leaving a bitmap enabled the whole run; note that we have to tear down and re-export a node when handling an error. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190111194720.15671-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>