aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2019-02-25block/nfs: Implement bdrv_dirname()Max Reitz
While the basic idea is obvious and could be handled by the default bdrv_dirname() implementation, we cannot generate a directory name if the gid or uid are set, so we have to explicitly return NULL in those cases. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-19-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25block/nbd: Make bdrv_dirname() return NULLMax Reitz
The generic bdrv_dirname() implementation would be able to generate some form of directory name for many NBD nodes, but it would be always wrong. Therefore, we have to explicitly make it an error (until NBD has some form of specification for export paths, if it ever will). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20190201192935.18394-18-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25quorum: Make bdrv_dirname() return NULLMax Reitz
While the common implementation for bdrv_dirname() should return NULL for quorum BDSs already (because they do not have a file node and their exact_filename field should be empty), there is no reason not to make that explicit. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-17-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25blkverify: Make bdrv_dirname() return NULLMax Reitz
blkverify's BDSs have a file BDS, but we do not want this to be preferred over the raw node. There is no way to decide between the two (and not really a reason to, either), so just return NULL in blkverify's implementation of bdrv_dirname(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-16-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25block: bdrv_get_full_backing_filename's ret. val.Max Reitz
Make bdrv_get_full_backing_filename() return an allocated string instead of placing the result in a caller-provided buffer. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-12-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25block: bdrv_get_full_backing_filename_from_...'s ret. val.Max Reitz
Make bdrv_get_full_backing_filename_from_filename() return an allocated string instead of placing the result in a caller-provided buffer. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-11-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25block: Make path_combine() return the pathMax Reitz
Besides being safe for arbitrary path lengths, after some follow-up patches all callers will want a freshly allocated buffer anyway. In the meantime, path_combine_deprecated() is added which has the same interface as path_combine() had before this patch. All callers to that function will be converted in follow-up patches. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20190201192935.18394-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25block: Add BDS.auto_backing_fileMax Reitz
If the backing file is overridden, this most probably does change the guest-visible data of a BDS. Therefore, we will need to consider this in bdrv_refresh_filename(). To see whether it has been overridden, we might want to compare bs->backing_file and bs->backing->bs->filename. However, bs->backing_file is changed by bdrv_set_backing_hd() (which is just used to change the backing child at runtime, without modifying the image header), so bs->backing_file most of the time simply contains a copy of bs->backing->bs->filename anyway, so it is useless for such a comparison. This patch adds an auto_backing_file BDS field which contains the backing file path as indicated by the image header, which is not changed by bdrv_set_backing_hd(). Because of bdrv_refresh_filename() magic, however, a BDS's filename may differ from what has been specified during bdrv_open(). Then, the comparison between bs->auto_backing_file and bs->backing->bs->filename may fail even though bs->backing was opened from bs->auto_backing_file. To mitigate this, we can copy the real BDS's filename (after the whole bdrv_open() and bdrv_refresh_filename() process) into bs->auto_backing_file, if we know the former has been opened based on the latter. This is only possible if no options modifying the backing file's behavior have been specified, though. To simplify things, this patch only copies the filename from the backing file if no options have been specified for it at all. Furthermore, there are cases where an overlay is created by qemu which already contains a BDS's filename (e.g. in blockdev-snapshot-sync). We do not need to worry about updating the overlay's bs->auto_backing_file there, because we actually wrote a post-bdrv_refresh_filename() filename into the image header. So all in all, there will be false negatives where (as of a future patch) bdrv_refresh_filename() will assume that the backing file differs from what was specified in the image header, even though it really does not. However, these cases should be limited to where (1) the user actually did override something in the backing chain (e.g. by specifying options for the backing file), or (2) the user executed a QMP command to change some node's backing file (e.g. change-backing-file or block-commit with @backing-file given) where the given filename does not happen to coincide with qemu's idea of the backing BDS's filename. Then again, (1) really is limited to -drive. With -blockdev or blockdev-add, you have to adhere to the schema, so a user cannot give partial "unimportant" options (e.g. by just setting backing.node-name and leaving the rest to the image header). Therefore, trying to fix this would mean trying to fix something for -drive only. To improve on (2), we would need a full infrastructure to "canonicalize" an arbitrary filename (+ options), so it can be compared against another. That seems a bit over the top, considering that filenames nowadays are there mostly for the user's entertainment. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-5-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25block: Use children list in bdrv_refresh_filenameMax Reitz
bdrv_refresh_filename() should invoke itself recursively on all children, not just on file. With that change, we can remove the manual invocations in blkverify, quorum, commit, mirror, and blklogwrites. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25block: Use bdrv_refresh_filename() to pullMax Reitz
Before this patch, bdrv_refresh_filename() is used in a pushing manner: Whenever the BDS graph is modified, the parents of the modified edges are supposed to be updated (recursively upwards). However, that is nonviable, considering that we want child changes not to concern parents. Also, in the long run we want a pull model anyway: Here, we would have a bdrv_filename() function which returns a BDS's filename, freshly constructed. This patch is an intermediate step. It adds bdrv_refresh_filename() calls before every place a BDS.filename value is used. The only exceptions are protocol drivers that use their own filename, which clearly would not profit from refreshing that filename before. Also, bdrv_get_encrypted_filename() is removed along the way (as a user of BDS.filename), since it is completely unused. In turn, all of the calls to bdrv_refresh_filename() before this patch are removed, because we no longer have to call this function on graph changes. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-2-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25block/nvme: Remove QEMU_PACKED from naturally aligned NVMeRegs structThomas Huth
The QEMU_PACKED is causing a compiler warning/error with GCC 9: CC block/nvme.o block/nvme.c: In function ‘nvme_create_queue_pair’: block/nvme.c:209:22: error: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Werror=address-of-packed-member] 209 | q->sq.doorbell = &s->regs->doorbells[idx * 2 * s->doorbell_scale]; All members of the struct are naturally aligned, so there should not be the need for QEMU_PACKED here, and the following QEMU_BUILD_BUG_ON also ensures that there is no padding. Thus simply remove the QEMU_PACKED here. Buglink: https://bugs.launchpad.net/qemu/+bug/1817525 Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25qcow2: Assert that L2 table offsets fit in the L1 tableAlberto Garcia
L1 table entries have a field to store the offset of an L2 table. The rest of the bits of the entry are currently reserved except from bit 63, which stores the COPIED flag. The offset is always taken from the entry using L1E_OFFSET_MASK to ensure that we only use the bits that belong to that field. While that mask is used every time we read from the L1 table, it is never used when we write to it. Due to the limits set elsewhere in the code QEMU can never produce L2 table offsets that don't fit in that field so any such offset when allocating an L2 table would indicate a bug in QEMU. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25nbd: Increase bs->in_flight during AioContext switchKevin Wolf
bdrv_drain() must not leave connection_co scheduled, so bs->in_flight needs to be increased while the coroutine is waiting to be scheduled in the new AioContext after nbd_client_attach_aio_context(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25nbd: Use low-level QIOChannel API in nbd_read_eof()Kevin Wolf
Instead of using the convenience wrapper qio_channel_read_all_eof(), use the lower level QIOChannel API. This means duplicating some code, but we'll need this because this coroutine yield is special: We want it to be interruptible so that nbd_client_attach_aio_context() can correctly reenter the coroutine. This moves the bdrv_dec/inc_in_flight() pair into nbd_read_eof(), so that connection_co will always sit in this exact qio_channel_yield() call when bdrv_drain() returns. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2019-02-25nbd: Restrict connection_co reentranceKevin Wolf
nbd_client_attach_aio_context() schedules connection_co in the new AioContext and this way reenters it in any arbitrary place that has yielded. We can restrict this a bit to the function call where the coroutine actually sits waiting when it's idle. This doesn't solve any bug yet, but it shows where in the code we need to support this random reentrance and where we don't have to care. Add FIXME comments for the existing bugs that the rest of this series will fix. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2019-02-25block-backend: Make blk_inc/dec_in_flight publicKevin Wolf
For some users of BlockBackends, just increasing the in_flight counter is easier than implementing separate handlers in BlockDevOps. Make the helper functions for this public. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25commit: Replace commit_top_bs on failure after deleting the block jobAlberto Garcia
If there's an error in commit_start() then the block job must be deleted before replacing commit_top_bs, otherwise it will fail because of lack of permissions. This happens since the permission system was introduced in 8dfba2797761d8a43744e4e6571c8175e448a478. Fortunately this bug doesn't seem to be possible to reproduce at the moment without changing the code. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25qcow2-snapshot: remove redundant find_snapshot_by_id_and_name callDaniel Henrique Barboza
In qcow2_snapshot_create there is the following code block: /* Generate an ID */ find_new_snapshot_id(bs, sn_info->id_str, sizeof(sn_info->id_str)); /* Check that the ID is unique */ if (find_snapshot_by_id_and_name(bs, sn_info->id_str, NULL) >= 0) { return -EEXIST; } find_new_snapshot_id cycles through all snapshots, getting the id_str as an unsigned long int, calculating the max id_max value of all the existing id_strs and writing in the id_str pointer id_max + 1: for(i = 0; i < s->nb_snapshots; i++) { sn = s->snapshots + i; id = strtoul(sn->id_str, NULL, 10); if (id > id_max) id_max = id; } snprintf(id_str, id_str_size, "%lu", id_max + 1); Here, sn_info->id_str will have the unique value id_max + 1. Right after that, find_snapshot_by_id_and_name is called with id = sn_info->id_str and name = NULL. This will cause the function to execute the following: } else if (id) { for (i = 0; i < s->nb_snapshots; i++) { if (!strcmp(s->snapshots[i].id_str, id)) { return i; } } } In short, we're searching the existing snapshots to see if sn_info->id_str matches any existing id, right after we set in the previous line a sn_info->id_str value that is already unique. The first code block goes way back to commit 585f8587ad, a 2006 commit from Fabrice Bellard that simply says "new qcow2 disk image format". No more info is provided about this logic in any subsequent commits that moved this code block around. I can't say about the original design, but the current logic is redundant. bdrv_snapshot_create is called in aio_context lock, forbidding any concurrent call to accidentally create a new snapshot between the find_new_snapshot_id and find_snapshot_by_id_and_name calls. What we're ending up doing is to cycle through the snapshots two times for no viable reason. This patch eliminates the redundancy by removing the 'id is unique' check that calls find_snapshot_by_id_and_name. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25block/snapshot: remove bdrv_snapshot_delete_by_id_or_nameDaniel Henrique Barboza
After the previous patch, the only instance of this function left is inside qemu-img.c. qemu-img is using it inside the 'img_snapshot' function to delete snapshots in the SNAPSHOT_DELETE case, based on a "snapshot_name" string that refers to the tag, not ID, of the QEMUSnapshotInfo struct. This can be verified by checking the SNAPSHOT_CREATE case that comes shortly before SNAPSHOT_DELETE. In that case, the same "snapshot_name" variable is being strcpy to the 'name' field of the QEMUSnapshotInfo struct sn: pstrcpy(sn.name, sizeof(sn.name), snapshot_name); Based on that, it is unlikely that "snapshot_name" might contain an "id" in SNAPSHOT_DELETE. This patch changes SNAPSHOT_DELETE to use snapshot_find() and snapshot_delete() instead of bdrv_snapshot_delete_by_id_or_name. After that, there is no instances left of bdrv_snapshot_delete_by_id_or_name in the code, so it is safe to remove it entirely. Suggested-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25block/snapshot.c: eliminate use of ID input in snapshot operationsDaniel Henrique Barboza
At this moment, QEMU attempts to create/load/delete snapshots by using either an ID (id_str) or a name. The problem is that the code isn't consistent of whether the entered argument is an ID or a name, causing unexpected behaviors. For example, when creating snapshots via savevm <arg>, what happens is that "arg" is treated as both name and id_str. In a guest without snapshots, create a single snapshot via savevm: (qemu) savevm 0 (qemu) info snapshots List of snapshots present on all disks: ID TAG VM SIZE DATE VM CLOCK -- 0 741M 2018-07-31 13:39:56 00:41:25.313 A snapshot with name "0" is created. ID is hidden from the user, but the ID is a non-zero integer that starts at "1". Thus, this snapshot has id_str=1, TAG="0". Creating a second snapshot with arg = 1, the first one is deleted: (qemu) savevm 1 (qemu) info snapshots List of snapshots present on all disks: ID TAG VM SIZE DATE VM CLOCK -- 1 741M 2018-07-31 13:42:14 00:41:55.252 What happened? - when creating the second snapshot, a verification is done inside bdrv_all_delete_snapshot to delete any existing snapshots that matches an string argument. Here, the code calls bdrv_all_delete_snapshot("1", ...); - bdrv_all_delete_snapshot calls bdrv_snapshot_find(..., "1") for each BlockDriverState of the guest. And this is where things goes tilting: bdrv_snapshot_find does a search by both id_str and name. It finds out that there is a snapshot that has id_str = 1, stores a reference to the snapshot in the sn_info pointer and then returns match found; - since a match was found, a call to bdrv_snapshot_delete_by_id_or_name() is made. This function ignores the pointer written by bdrv_snapshot_find. Instead, it deletes the snapshot using bdrv_snapshot_delete() calling it first with id_str = 1. If it fails to delete, then it calls it again with name = 1. - after all that, QEMU creates the new snapshot, that has id_str = 1 and name = 1. The user is left wondering that happened with the first snapshot created. Similar bugs can be triggered when using loadvm and delvm. Before contemplating discarding the use of ID input in these operations, I've searched the code of what would be the implications. My findings are: - the RBD and Sheepdog drivers don't care. Both uses the 'name' field as key in their logic, making id_str = name when appropriate. replay-snapshot.c does not make any special use of id_str; - qcow2 uses id_str as an unique identifier but it is automatically calculated, not being influenced by user input. Other than that, there are no distinguish operations made only with id_str; - in blockdev.c, the delete operation uses a match of both id_str AND name. Given that id_str is either a copy of 'name' or auto-generated, we're fine here. This gives motivation to not consider ID as a valid user input in HMP commands - sticking with 'name' input only is more consistent. To accomplish that, the following changes were made in this patch: - bdrv_snapshot_find() does not match for id_str anymore, only 'name'. The function is called in save_snapshot(), load_snapshot(), bdrv_all_delete_snapshot() and bdrv_all_find_snapshot(). This change makes the search function more predictable and does not change the behavior of any underlying code that uses these affected functions, which are related to HMP (which is fine) and the main loop inside vl.c (which doesn't care about it anyways); - bdrv_all_delete_snapshot() does not call bdrv_snapshot_delete_by_id_or_name anymore. Instead, it uses the pointer returned by bdrv_snapshot_find to erase the snapshot with the exact match of id_str an name. This function is called in save_snapshot and hmp_delvm, thus this change produces the intended effect; - documentation changes to reflect the new behavior. I consider this to be an API fix instead of an API change - the user was already creating snapshots using 'name', but now he/she will also enjoy a consistent behavior. Ideally we would get rid of the id_str field entirely, but this would have repercussions on existing snapshots. Another day perhaps. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-19block/dirty-bitmap: Documentation and Comment fixupsJohn Snow
The meaning of the states has changed subtly over time, this should bring the understanding more in-line with the current, actual usages. Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190202011048.12343-1-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-02-19dirty-bitmap: Expose persistent flag to 'query-block'Eric Blake
Since qemu currently doesn't flush persistent bitmaps to disk until shutdown (which might be MUCH later), it's useful if 'query-block' at least shows WHICH bitmaps will (eventually) make it to persistent storage. Update affected iotests. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190204210512.27458-1-eblake@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-02-11qcow2: Add list of bitmaps to ImageInfoSpecificQCow2Andrey Shinkevich
In the 'Format specific information' section of the 'qemu-img info' command output, the supplemental information about existing QCOW2 bitmaps will be shown, such as a bitmap name, flags and granularity: image: /vz/vmprivate/VM1/harddisk.hdd file format: qcow2 virtual size: 64G (68719476736 bytes) disk size: 3.0M cluster_size: 1048576 Format specific information: compat: 1.1 lazy refcounts: true bitmaps: [0]: flags: [0]: in-use [1]: auto name: back-up1 granularity: 65536 [1]: flags: [0]: in-use [1]: auto name: back-up2 granularity: 65536 refcount bits: 16 corrupt: false Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1549638368-530182-3-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-11bdrv_query_image_info Error parameter addedAndrey Shinkevich
Inform a user in case qcow2_get_specific_info fails to obtain QCOW2 image specific information. This patch is preliminary to the one "qcow2: Add list of bitmaps to ImageInfoSpecificQCow2". Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <1549638368-530182-2-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04block/nbd-client: rename read_reply_co to connection_coVladimir Sementsov-Ogievskiy
This coroutine will serve nbd reconnects, so, rename it to be something more generic. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190201130138.94525-7-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04block/nbd-client: don't check iocVladimir Sementsov-Ogievskiy
We have several paranoid checks for ioc != NULL. But ioc may become NULL only on close, which should not happen during requests handling. Also, we check ioc only sometimes, not after each yield, which is inconsistent. Let's drop these checks. However, for safety, let's leave asserts instead. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190201130138.94525-6-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04block/nbd-client: fix nbd_reply_chunk_iter_receiveVladimir Sementsov-Ogievskiy
Use exported report, not the variable to be reused (should not really matter). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190201130138.94525-5-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04block/nbd-client: split connection from initializationVladimir Sementsov-Ogievskiy
Split connection code to reuse it for reconnect. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190201130138.94525-4-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04block/nbd: move connection code from block/nbd to block/nbd-clientVladimir Sementsov-Ogievskiy
Keep all connection code in one file, to be able to implement reconnect in further patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190201130138.94525-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: format tweak] Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04block/nbd-client: split channel errors from export errorsVladimir Sementsov-Ogievskiy
To implement nbd reconnect in further patches, we need to distinguish error codes, returned by nbd server, from channel errors, to reconnect only in the latter case. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190201130138.94525-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04nbd: generalize usage of nbd_readVladimir Sementsov-Ogievskiy
We generally do very similar things around nbd_read: error_prepend specifying what we have tried to read, and be_to_cpu conversion of integers. So, it seems reasonable to move common things to helper functions, which: 1. simplify code a bit 2. generalize nbd_read error descriptions, all starting with "Failed to read" 3. make it more difficult to forget to convert things from BE Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190128165830.165170-1-vsementsov@virtuozzo.com> [eblake: rename macro to DEF_NBD_READ_N and formatting tweaks; checkpatch has false positive complaint] Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-01block: Eliminate the S_1KiB, S_2KiB, ... macrosMarkus Armbruster
We define 54 macros for the powers of two >= 1024. We use six, in six macro definitions. Four of them could just as well use the common MiB macro, so do that. The remaining two can't, because they get passed to stringify. Replace the macro by the literal number there. Slightly harder to read in one instance (1048576 vs. S_1MiB), so add a comment there. The other instance is a wash: 65536 vs S_64KiB. 65536 has been good enough for more than seven years there. This effectively reverts commit 540b8492618 and 1240ac558d3. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01block: Remove blk_attach_dev_legacy() / legacy_dev codeThomas Huth
The last user of blk_attach_dev_legacy() was the code in xen_disk which has recently been reworked. Now there is no user for this legacy function anymore. Thus we can finally remove all code related to the "legacy_dev" flag, too, and turn the related "void *" in block-backend.c into proper "DeviceState *" to fix some of the remaining TODOs there. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01uuid: Make qemu_uuid_bswap() take and return a QemuUUIDPeter Maydell
Currently qemu_uuid_bswap() takes a pointer to the QemuUUID to be byte-swapped. This means it can't be used when the UUID to be swapped is in a packed member of a struct. It's also out of line with the general bswap*() functions we provide in bswap.h, which take the value to be swapped and return it. Make qemu_uuid_bswap() take a QemuUUID and return the swapped version. This fixes some clang warnings about taking the address of a packed struct member in block/vdi.c. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01block/vdi: Don't take address of fields in packed structsPeter Maydell
Taking the address of a field in a packed struct is a bad idea, because it might not be actually aligned enough for that pointer type (and thus cause a crash on dereference on some host architectures). Newer versions of clang warn about this. Instead of passing UUID related functions the address of a possibly unaligned QemuUUID struct, use local variables and then copy to/from the struct field as appropriate. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01block/vpc: Don't take address of fields in packed structsPeter Maydell
Taking the address of a field in a packed struct is a bad idea, because it might not be actually aligned enough for that pointer type (and thus cause a crash on dereference on some host architectures). Newer versions of clang warn about this. Avoid the bug by generating the UUID into a local variable which is definitely safely aligned and then copying it into place. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01vmdk: Reject excess extents in blockdev-createKevin Wolf
Clarify that the number of extents provided in BlockdevCreateOptionsVmdk must match the number of extents that will actually be used. Providing more extents will result in an error now. This requires adapting the test case to provide the right number of extents. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
2019-02-01vmdk: Implement .bdrv_co_create callbackFam Zheng
This makes VMDK support blockdev-create. The implementation reuses the image creation code in vmdk_co_create_opts which now acceptes a callback pointer to "retrieve" BlockBackend pointers from the caller. This way we separate the logic between file/extent acquisition and initialization. The QAPI command parameters are mostly the same as the old create_opts except the dropped legacy @compat6 switch, which is redundant with @hwversion. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01vmdk: Refactor vmdk_create_extentFam Zheng
The extracted vmdk_init_extent takes a BlockBackend object and initializes the format metadata. It is the common part between "qemu-img create" and "blockdev-create". Add a "BlockBackend *pbb" parameter to vmdk_create_extent, to return the opened BB to the caller in the next patch. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01block: Fix hangs in synchronous APIs with iothreadsKevin Wolf
In the block layer, synchronous APIs are often implemented by creating a coroutine that calls the asynchronous coroutine-based implementation and then waiting for completion with BDRV_POLL_WHILE(). For this to work with iothreads (more specifically, when the synchronous API is called in a thread that is not the home thread of the block device, so that the coroutine will run in a different thread), we must make sure to call aio_wait_kick() at the end of the operation. Many places are missing this, so that BDRV_POLL_WHILE() keeps hanging even if the condition has long become false. Note that bdrv_dec_in_flight() involves an aio_wait_kick() call. This corresponds to the BDRV_POLL_WHILE() in the drain functions, but it is generally not enough for most other operations because they haven't set the return value in the coroutine entry stub yet. To avoid race conditions there, we need to kick after setting the return value. The race window is small enough that the problem doesn't usually surface in the common path. However, it does surface and causes easily reproducible hangs if the operation can return early before even calling bdrv_inc/dec_in_flight, which many of them do (trivial error or no-op success paths). The bug in bdrv_truncate(), bdrv_check() and bdrv_invalidate_cache() is slightly different: These functions even neglected to schedule the coroutine in the home thread of the node. This avoids the hang, but is obviously wrong, too. Fix those to schedule the coroutine in the right AioContext in addition to adding aio_wait_kick() calls. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-01block: Replace qdict_put() by qdict_put_obj() where appropriateMarkus Armbruster
Patch created mechanically by rerunning: $ spatch --sp-file scripts/coccinelle/qobject.cocci \ --macro-file scripts/cocci-macro-file.h \ --dir block --in-place Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01qcow2: Assert that refcount block offsets fit in the refcount tableAlberto Garcia
Refcount table entries have a field to store the offset of the refcount block. The rest of the bits of the entry are currently reserved. The offset is always taken from the entry using REFT_OFFSET_MASK to ensure that we only use the bits that belong to that field. While that mask is used every time we read from the refcount table, it is never used when we write to it. Due to the other constraints of the qcow2 format QEMU can never produce refcount block offsets that don't fit in that field so any such offset when allocating a refcount block would indicate a bug in QEMU. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01mirror: Block the source BlockDriverState in mirror_start_job()Alberto Garcia
The mirror_start_job() function used for the commit-active job blocks the source, target and all intermediate nodes for the duration of the job. target <- intermediate <- source Since 4ef85a9c2339 this function creates a dummy mirror_top_bs that goes on top of the source node, and it is this dummy node that gets blocked instead. The source node is never blocked or added to the job's list of nodes. target <- intermediate <- source <- mirror_top At the moment I don't think it is possible to exploit this problem because any additional job on 'source' would either be forbidden for other reasons or it would need to involve an additional node that is blocked, causing an error. This can be seen in the error messages, however, because they never refer to the source node being blocked: $ qemu-img create -f qcow2 hd0.qcow2 1M $ qemu-img create -f qcow2 -b hd0.qcow2 hd1.qcow2 $ qemu-io -c 'write 0 1M' hd0.qcow2 $ $QEMU -drive if=none,file=hd1.qcow2,node-name=hd1 { "execute": "qmp_capabilities" } { "execute": "block-commit", "arguments": {"device": "hd1", "speed": 256}} { "execute": "block-stream", "arguments": {"device": "hd1"}} { "error": {"class": "GenericError", "desc": "Node 'hd0' is busy: block device is in use by block job: commit"}} After this patch the error message refers to 'hd1', as it should. The expected output of iotest 141 also needs to be updated for the same reason. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-01mirror: Release the dirty bitmap if mirror_start_job() failsAlberto Garcia
At the moment I don't see how to make this function fail after the dirty bitmap has been created, but if that was possible then we would hit the assert(QLIST_EMPTY(&bs->dirty_bitmaps)) in bdrv_close(). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-01-31block/sheepdog: Convert from DPRINTF() macro to trace eventsLaurent Vivier
Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20181213162727.17438-5-lvivier@redhat.com [mreitz: Fixed sheepdog_snapshot_create_inode's format string to use PRIx32 for uint32_ts] Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-01-31block/file-posix: Convert from DPRINTF() macro to trace eventsLaurent Vivier
Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20181213162727.17438-4-lvivier@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-01-31block/curl: Convert from DPRINTF() macro to trace eventsLaurent Vivier
Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20181213162727.17438-3-lvivier@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-01-31block/ssh: Convert from DPRINTF() macro to trace eventsLaurent Vivier
Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20181213162727.17438-2-lvivier@redhat.com [mreitz: Fixed type of ssh_{read,write}_return's parameter to be ssize_t instead of size_t] Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-01-31qapi: add x-debug-query-block-graphVladimir Sementsov-Ogievskiy
Add a new command, returning block nodes (and their users) graph. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20181221170909.25584-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-01-24throttle-groups: fix restart coroutine iothread raceStefan Hajnoczi
The following QMP command leads to a crash when iothreads are used: { 'execute': 'device_del', 'arguments': {'id': 'data'} } The backtrace involves the queue restart coroutine where tgm->throttle_state is a NULL pointer because throttle_group_unregister_tgm() has already been called: (gdb) bt full #0 0x00005585a7a3b378 in qemu_mutex_lock_impl (mutex=0xffffffffffffffd0, file=0x5585a7bb3d54 "block/throttle-groups.c", line=412) at util/qemu-thread-posix.c:64 err = <optimized out> __PRETTY_FUNCTION__ = "qemu_mutex_lock_impl" __func__ = "qemu_mutex_lock_impl" #1 0x00005585a79be074 in throttle_group_restart_queue_entry (opaque=0x5585a9de4eb0) at block/throttle-groups.c:412 _f = <optimized out> data = 0x5585a9de4eb0 tgm = 0x5585a9079440 ts = 0x0 tg = 0xffffffffffffff98 is_write = false empty_queue = 255 This coroutine should not execute in the iothread after the throttle group member has been unregistered! The root cause is that the device_del code path schedules the restart coroutine in the iothread while holding the AioContext lock. Therefore the iothread cannot execute the coroutine until after device_del releases the lock - by this time it's too late. This patch adds a reference count to ThrottleGroupMember so we can synchronously wait for restart coroutines to complete. Once they are done it is safe to unregister the ThrottleGroupMember. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190114133257.30299-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>