aboutsummaryrefslogtreecommitdiff
path: root/block/block-backend.c
AgeCommit message (Collapse)Author
2018-05-15block-backend: simplify blk_get_aio_contextDaniel Henrique Barboza
blk_get_aio_context verifies if BlockDriverState bs is not NULL, return bdrv_get_aio_context(bs) if true or qemu_get_aio_context() otherwise. However, bdrv_get_aio_context from block.c already does this verification itself, also returning qemu_get_aio_context() if bs is NULL: AioContext *bdrv_get_aio_context(BlockDriverState *bs) { return bs ? bs->aio_context : qemu_get_aio_context(); } This patch simplifies blk_get_aio_context to simply call bdrv_get_aio_context instead of replicating the same logic. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-13block: let blk_add/remove_aio_context_notifier() tolerate BDS changesStefan Hajnoczi
Commit 2019ba0a0197 ("block: Add AioContextNotifier functions to BB") added blk_add/remove_aio_context_notifier() and implemented them by passing through the bdrv_*() equivalent. This doesn't work across bdrv_append(), which detaches child->bs and re-attaches it to a new BlockDriverState. When blk_remove_aio_context_notifier() is called we will access the new BDS instead of the one where the notifier was added! >From the point of view of the blk_*() API user, changes to the root BDS should be transparent. This patch maintains a list of AioContext notifiers in BlockBackend and adds/removes them from the BlockDriverState as needed. Reported-by: Stefano Panella <spanella@gmail.com> Cc: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180306204819.11266-2-stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-08block: Fix qemu crash when using scsi-blockDeepa Srinivasan
Starting qemu with the following arguments causes qemu to segfault: ... -device lsi,id=lsi0 -drive file=iscsi:<...>,format=raw,if=none,node-name= iscsi1 -device scsi-block,bus=lsi0.0,id=<...>,drive=iscsi1 This patch fixes blk_aio_ioctl() so it does not pass stack addresses to blk_aio_ioctl_entry() which may be invoked after blk_aio_ioctl() returns. More details about the bug follow. blk_aio_ioctl() invokes blk_aio_prwv() with blk_aio_ioctl_entry as the coroutine parameter. blk_aio_prwv() ultimately calls aio_co_enter(). When blk_aio_ioctl() is executed from within a coroutine context (e.g. iscsi_bh_cb()), aio_co_enter() adds the coroutine (blk_aio_ioctl_entry) to the current coroutine's wakeup queue. blk_aio_ioctl() then returns. When blk_aio_ioctl_entry() executes later, it accesses an invalid pointer: .... BlkRwCo *rwco = &acb->rwco; rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset, rwco->qiov->iov[0].iov_base); <--- qiov is invalid here ... In the case when blk_aio_ioctl() is called from a non-coroutine context, blk_aio_ioctl_entry() executes immediately. But if bdrv_co_ioctl() calls qemu_coroutine_yield(), blk_aio_ioctl() will return. When the coroutine execution is complete, control returns to blk_aio_ioctl_entry() after the call to blk_co_ioctl(). There is no invalid reference after this point, but the function is still holding on to invalid pointers. The fix is to change blk_aio_prwv() to accept a void pointer for the IO buffer rather than a QEMUIOVector. blk_aio_prwv() passes this through in BlkRwCo and the coroutine function casts it to QEMUIOVector or uses the void pointer directly. Signed-off-by: Deepa Srinivasan <deepa.srinivasan@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-03-06Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches # gpg: Signature made Mon 05 Mar 2018 17:45:51 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (38 commits) block: Fix NULL dereference on empty drive error qcow2: Replace align_offset() with ROUND_UP() block/ssh: Add basic .bdrv_truncate() block/ssh: Make ssh_grow_file() blocking block/ssh: Pull ssh_grow_file() from ssh_create() qemu-img: Make resize error message more general qcow2: make qcow2_co_create2() a coroutine_fn block: rename .bdrv_create() to .bdrv_co_create_opts() Revert "IDE: Do not flush empty CDROM drives" block: test blk_aio_flush() with blk->root == NULL block: add BlockBackend->in_flight counter block: extract AIO_WAIT_WHILE() from BlockDriverState aio: rename aio_context_in_iothread() to in_aio_context_home_thread() docs: document how to use the l2-cache-entry-size parameter specs/qcow2: Fix documentation of the compressed cluster descriptor iotest 033: add misaligned write-zeroes test via truncate block: fix write with zero flag set and iovector provided block: Drop unused .bdrv_co_get_block_status() vvfat: Switch to .bdrv_co_block_status() vpc: Switch to .bdrv_co_block_status() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # include/block/block.h
2018-03-05block: Fix NULL dereference on empty drive errorKevin Wolf
blk_error_action() sends a BLOCK_IO_ERROR QMP event which includes the node name of its root node. If the BlockBackend represents an empty drive, there is no root node, so we should not try to access its node name. Make the field optional in the event and include it only when the BlockBackend isn't empty. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-02Include less of the generated modular QAPI headersMarkus Armbruster
In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-02block: add BlockBackend->in_flight counterStefan Hajnoczi
BlockBackend currently relies on BlockDriverState->in_flight to track requests for blk_drain(). There is a corner case where BlockDriverState->in_flight cannot be used though: blk->root can be NULL when there is no medium. This results in a segfault when the NULL pointer is dereferenced. Introduce a BlockBackend->in_flight counter for aio requests so it works even when blk->root == NULL. Based on a patch by Kevin Wolf <kwolf@redhat.com>. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-02-09Move include qemu/option.h from qemu-common.h to actual usersMarkus Armbruster
qemu-common.h includes qemu/option.h, but most places that include the former don't actually need the latter. Drop the include, and add it to the places that actually need it. While there, drop superfluous includes of both headers, and separate #include from file comment with a blank line. This cleanup makes the number of objects depending on qemu/option.h drop from 4545 (out of 4743) to 284 in my "build everything" tree. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-20-armbru@redhat.com> [Semantic conflict with commit bdd6a90a9e in block/nvme.c resolved]
2018-02-09Include qapi/error.h exactly where neededMarkus Armbruster
This cleanup makes the number of objects depending on qapi/error.h drop from 1910 (out of 4743) to 1612 in my "build everything" tree. While there, separate #include from file comment with a blank line, and drop a useless comment on why qemu/osdep.h is included first. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-5-armbru@redhat.com> [Semantic conflict with commit 34e304e975 resolved, OSX breakage fixed]
2018-02-08block: Introduce buf register APIFam Zheng
Allow block driver to map and unmap a buffer for later I/O, as a performance hint. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-5-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2017-11-21block: Don't request I/O permission with BDRV_O_NO_IOKevin Wolf
'qemu-img info' makes sense even when BLK_PERM_CONSISTENT_READ cannot be granted because of a block job in a running qemu process. It already sets BDRV_O_NO_IO to indicate that it doesn't access the guest visible data at all. Check the BDRV_O_NO_IO flags in blk_new_open(), so that I/O related permissions are not unnecessarily requested and 'qemu-img info' can work even if BLK_PERM_CONSISTENT_READ cannot be granted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>
2017-11-17block: Make bdrv_next() keep strong referencesMax Reitz
On one hand, it is a good idea for bdrv_next() to return a strong reference because ideally nearly every pointer should be refcounted. This fixes intermittent failure of iotest 194. On the other, it is absolutely necessary for bdrv_next() itself to keep a strong reference to both the BB (in its first phase) and the BDS (at least in the second phase) because when called the next time, it will dereference those objects to get a link to the next one. Therefore, it needs these objects to stay around until then. Just storing the pointer to the next in the iterator is not really viable because that pointer might become invalid as well. Both arguments taken together means we should probably just invoke bdrv_ref() and blk_ref() in bdrv_next(). This means we have to assert that bdrv_next() is always called from the main loop, but that was probably necessary already before this patch and judging from the callers, it also looks to actually be the case. Keeping these strong references means however that callers need to give them up if they decide to abort the iteration early. They can do so through the new bdrv_next_cleanup() function. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110172545.32609-1-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-11-13block: Leave valid throttle timers when removing a BDS from a backendAlberto Garcia
If a BlockBackend has I/O limits set then its ThrottleGroupMember structure uses the AioContext from its attached BlockDriverState. Those two contexts must be kept in sync manually. This is not ideal and will be fixed in the future by removing the throttling configuration from the BlockBackend and storing it in an implicit filter node instead, but for now we have to live with this. When you remove the BlockDriverState from the backend then the throttle timers are destroyed. If a new BlockDriverState is later inserted then they are created again using the new AioContext. There are a couple of problems with this: a) The code manipulates the timers directly, leaving the ThrottleGroupMember.aio_context field in an inconsisent state. b) If you remove the I/O limits (e.g by destroying the backend) when the timers are gone then throttle_group_unregister_tgm() will attempt to destroy them again, crashing QEMU. While b) could be fixed easily by allowing the timers to be freed twice, this would result in a situation in which we can no longer guarantee that a valid ThrottleState has a valid AioContext and timers. This patch ensures that the timers and AioContext are always valid when I/O limits are set, regardless of whether the BlockBackend has a BlockDriverState inserted or not. [Fixed "There'a" typo as suggested by Max Reitz <mreitz@redhat.com> --Stefan] Reported-by: sochin jiang <sochin.jiang@huawei.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: e089c66e7c20289b046d782cea4373b765c5bc1d.1510339534.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-11-13block: Check for inserted BlockDriverState in blk_io_limits_disable()Alberto Garcia
When you set I/O limits using block_set_io_throttle or the command line throttling.* options they are kept in the BlockBackend regardless of whether a BlockDriverState is attached to the backend or not. Therefore when removing the limits using blk_io_limits_disable() we need to check if there's a BDS before attempting to drain it, else it will crash QEMU. This can be reproduced very easily using HMP: (qemu) drive_add 0 if=none,throttling.iops-total=5000 (qemu) drive_del none0 Reported-by: sochin jiang <sochin.jiang@huawei.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 0d3a67ce8d948bb33e08672564714dcfb76a3d8c.1510339534.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-11-13throttle-groups: drain before detaching ThrottleStateStefan Hajnoczi
I/O requests hang after stop/cont commands at least since QEMU 2.10.0 with -drive iops=100: (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000 (qemu) stop (qemu) cont ...I/O is stuck... This happens because blk_set_aio_context() detaches the ThrottleState while requests may still be in flight: if (tgm->throttle_state) { throttle_group_detach_aio_context(tgm); throttle_group_attach_aio_context(tgm, new_context); } This patch encloses the detach/attach calls in a drained region so no I/O request is left hanging. Also add assertions so we don't make the same mistake again in the future. Reported-by: Yongxue Hong <yhong@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20171110151934.16883-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-11-13block: all I/O should be completed before removing throttle timers.Zhengui
In blk_remove_bs, all I/O should be completed before removing throttle timers. If there has inflight I/O, removing throttle timers here will cause the inflight I/O never return. This patch add bdrv_drained_begin before throttle_timers_detach_aio_context to let all I/O completed before removing throttle timers. [Moved declaration of bs as suggested by Alberto Garcia <berto@igalia.com>. --Stefan] Signed-off-by: Zhengui <lizhengui@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1508564040-120700-1-git-send-email-lizhengui@huawei.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-09-05block: tidy ThrottleGroupMember initializationsManos Pitsidianakis
Move the CoMutex and CoQueue inits inside throttle_group_register_tgm() which is called whenever a ThrottleGroupMember is initialized. There's no need for them to be separate. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-05block: add aio_context field in ThrottleGroupMemberManos Pitsidianakis
timer_cb() needs to know about the current Aio context of the throttle request that is woken up. In order to make ThrottleGroupMember backend agnostic, this information is stored in an aio_context field instead of accessing it from BlockBackend. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-05block: move ThrottleGroup membership to ThrottleGroupMemberManos Pitsidianakis
This commit eliminates the 1:1 relationship between BlockBackend and throttle group state. Users will be able to create multiple throttle nodes, each with its own throttle group state, in the future. The throttle group state cannot be per-BlockBackend anymore, it must be per-throttle node. This is done by gathering ThrottleGroup membership details from BlockBackendPublic into ThrottleGroupMember and refactoring existing code to use the structure. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-08-23block-backend: Allow more "can inactivate" casesFam Zheng
These two conditions corresponds to mirror job's source and target, which need to be allowed as they are part of the non-shared storage migration workflow: failing to inactivate either will result in a failure during migration completion. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-3-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [eblake: improve comment grammar] Signed-off-by: Eric Blake <eblake@redhat.com>
2017-08-23block-backend: Refactor inactivate checkFam Zheng
The logic will be fixed (extended), move it to a separate function. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-2-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-08-15block-backend: Defer shared_perm tightening migration completionFam Zheng
As in the case of nbd_export_new(), bdrv_invalidate_cache() can be called when migration is still in progress. In this case we are not ready to tighten the shared permissions fenced by blk->disable_perm. Defer to a VM state change handler. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170815130740.31229-4-famz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-07-18block: Make blk_all_next() publicKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
2017-07-18block: Make blk_get_attached_dev_id() publicKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
2017-07-11block: Add PreallocMode to blk_truncate()Max Reitz
blk_truncate() itself will pass that value to bdrv_truncate(), and all callers of blk_truncate() just set the parameter to PREALLOC_MODE_OFF for now. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11block: Add PreallocMode to bdrv_truncate()Max Reitz
For block drivers that just pass a truncate request to the underlying protocol, we can now pass the preallocation mode instead of aborting if it is not PREALLOC_MODE_OFF. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-06-26block: change variable names in BlockDriverStateManos Pitsidianakis
Change the 'int count' parameter in *pwrite_zeros, *pdiscard related functions (and some others) to 'int bytes', as they both refer to bytes. This helps with code legibility. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-06-16block: split BlockAcctStats creation and setupPaolo Bonzini
block_acct_destroy is called unconditionally in blk_delete, but there is no BlockAcctStats function that is called unconditionally in blk_new. Split block_acct_init in two, so that it will be possible to create a QemuMutex in block_acct_init and destroy it in block_acct_cleanup. Cc: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-19-pbonzini@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16throttle-groups: protect throttled requests with a CoMutexPaolo Bonzini
Another possibility is to use tg->lock, which we're holding anyway in both schedule_next_request and throttle_group_co_io_limits_intercept. This would require open-coding the CoQueue however, so I've chosen this alternative. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-10-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-16block: access io_limits_disabled with atomic opsPaolo Bonzini
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-4-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2017-06-09block: Fix anonymous BBs in blk_root_inactivate()Kevin Wolf
blk->name isn't an array, but a pointer that can be NULL. Checking for an anonymous BB must involve a NULL check first, otherwise we get crashes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>
2017-05-11block: Drop permissions when migration completesKevin Wolf
With image locking, permissions affect other qemu processes as well. We want to be sure that the destination can run, so let's drop permissions on the source when migration completes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11block: New BdrvChildRole.activate() for blk_resume_after_migration()Kevin Wolf
Instead of manually calling blk_resume_after_migration() in migration code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend activation with cache invalidation into a single function. This is achieved with a new callback in BdrvChildRole that is called by bdrv_invalidate_cache_all(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-04-28block: Add errp to b{lk,drv}_truncate()Max Reitz
For one thing, this allows us to drop the error message generation from qemu-img.c and blockdev.c and instead have it unified in bdrv_truncate(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170328205129.15138-3-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-04-27block: Constify data passed by pointer to blk_nameKrzysztof Kozlowski
blk_name() is not modifying data passed to it through pointer and it returns also a pointer to const so the argument can be made const for code safeness. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-04-11throttle: Remove block from group on hot-unplugEric Blake
When a block device that is part of a throttle group is hot-unplugged, we forgot to remove it from the throttle group. This leaves stale memory around, and causes an easily reproducible crash: $ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio \ -device virtio-scsi-pci,bus=pci.0 -drive \ id=drive_image2,if=none,format=raw,file=file2,bps=512000,iops=100,group=foo \ -device scsi-hd,id=image2,drive=drive_image2 -drive \ id=drive_image3,if=none,format=raw,file=file3,bps=512000,iops=100,group=foo \ -device scsi-hd,id=image3,drive=drive_image3 {'execute':'qmp_capabilities'} {'execute':'device_del','arguments':{'id':'image3'}} {'execute':'system_reset'} Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1428810 Suggested-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170406190847.29347-1-eblake@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-04-11block: Use bdrv_coroutine_enter to start I/O coroutinesFam Zheng
BDRV_POLL_WHILE waits for the started I/O by releasing bs's ctx then polling the main context, which relies on the yielded coroutine continuing on bs->ctx before notifying qemu_aio_context with bdrv_wakeup(). Thus, using qemu_coroutine_enter to start I/O is wrong because if the coroutine is entered from main loop, co->ctx will be qemu_aio_context, as a result of the "release, poll, acquire" loop of BDRV_POLL_WHILE, race conditions happen when both main thread and the iothread access the same BDS: main loop iothread ----------------------------------------------------------------------- blockdev_snapshot aio_context_acquire(bs->ctx) virtio_scsi_data_plane_handle_cmd bdrv_drained_begin(bs->ctx) bdrv_flush(bs) bdrv_co_flush(bs) aio_context_acquire(bs->ctx).enter ... qemu_coroutine_yield(co) BDRV_POLL_WHILE() aio_context_release(bs->ctx) aio_context_acquire(bs->ctx).return ... aio_co_wake(co) aio_poll(qemu_aio_context) ... co_schedule_bh_cb() ... qemu_coroutine_enter(co) ... /* (A) bdrv_co_flush(bs) /* (B) I/O on bs */ continues... */ aio_context_release(bs->ctx) aio_context_acquire(bs->ctx) Note that in above case, bdrv_drained_begin() doesn't do the "release, poll, acquire" in BDRV_POLL_WHILE, because bs->in_flight == 0. Fix this by using bdrv_coroutine_enter and enter coroutine in the right context. iotests 109 output is updated because the coroutine reenter flow during mirror job complete is different (now through co_queue_wakeup, instead of the unconditional qemu_coroutine_switch before), making the end job len different. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2017-04-07block: Ignore guest dev permissions during incoming migrationKevin Wolf
Usually guest devices don't like other writers to the same image, so they use blk_set_perm() to prevent this from happening. In the migration phase before the VM is actually running, though, they don't have a problem with writes to the image. On the other hand, storage migration needs to be able to write to the image in this phase, so the restrictive blk_set_perm() call of qdev devices breaks it. This patch flags all BlockBackends with a qdev device as blk->disable_perm during incoming migration, which means that the requested permissions are stored in the BlockBackend, but not actually applied to its root node yet. Once migration has finished and the VM should be resumed, the permissions are applied. If they cannot be applied (e.g. because the NBD server used for block migration hasn't been shut down), resuming the VM fails. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kashyap Chamarthy <kchamart@redhat.com>
2017-03-22block-backend: add drained_begin / drained_end opsJohn Snow
Allow block backends to forward drain requests to their devices/users. The initial intended purpose for this patch is to allow BBs to forward requests along to BlockJobs, which will want to pause if their associated BB has entered a drained region. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 20170316212351.13797-3-jsnow@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>
2017-03-07block: Don't use error_abort in blk_new_openFam Zheng
We have an errp and bdrv_root_attach_child can fail permission check, error_abort is not the best choice here. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-02-28hmp: Request permissions in qemu-ioKevin Wolf
The HMP command 'qemu-io' is a bit tricky because it wants to work on the original BlockBackend, but additional permissions could be required. The details are explained in a comment in the code, but in summary, just request whatever permissions the current qemu-io command needs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28block: Add BdrvChildRole.get_parent_desc()Kevin Wolf
For meaningful error messages in the permission system, we need to get some human-readable description of the parent of a BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28block: Allow error return in BlockDevOps.change_media_cb()Kevin Wolf
Some devices allow a media change between read-only and read-write media. They need to adapt the permissions in their .change_media_cb() implementation, which can fail. So add an Error parameter to the function. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28block: Request real permissions in blk_new_open()Kevin Wolf
We can figure out the necessary permissions from the flags that the caller passed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28block: Add error parameter to blk_insert_bs()Kevin Wolf
Now that blk_insert_bs() requests the BlockBackend permissions for the node it attaches to, it can fail. Instead of aborting, pass the errors to the callers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28block: Add permissions to blk_new()Kevin Wolf
We want every user to be specific about the permissions it needs, so we'll pass the initial permissions as parameters to blk_new(). A user only needs to call blk_set_perm() if it wants to change the permissions after the fact. The permissions are stored in the BlockBackend and applied whenever a BlockDriverState should be attached in blk_insert_bs(). This does not include actually choosing the right set of permissions everywhere yet. Instead, the usual FIXME comment is added to each place and will be addressed in individual patches. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28block: Add permissions to BlockBackendKevin Wolf
The BlockBackend can now store the permissions that its user requires. This is necessary because nodes can be ejected from or inserted into a BlockBackend and all of these operations must make sure that the user still gets what it requested initially. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-28block: Let callers request permissions when attaching a child nodeKevin Wolf
When attaching a node as a child to a new parent, the required and shared permissions for this parent are checked against all other parents of the node now, and an error is returned if there is a conflict. This allows error returns to a function that previously always succeeded, and the same is true for quite a few callers and their callers. Converting all of them within the same patch would be too much, so for now everyone tells that they don't need any permissions and allow everyone else to do anything. This way we can use &error_abort initially and convert caller by caller to pass actual permission requirements and implement error handling. All these places are marked with FIXME comments and it will be the job of the next patches to clean them up again. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>
2017-02-24block: Pass BdrvChild to bdrv_truncate()Kevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2017-02-21block: explicitly acquire aiocontext in aio callbacks that need itPaolo Bonzini
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-16-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>