aboutsummaryrefslogtreecommitdiff
path: root/include/block
AgeCommit message (Collapse)Author
2018-03-26include/block/block_int: Document protocol related functionsFabiano Rosas
Clarify that: - for protocols the brdv_file_open function is used instead of bdrv_open; - when protocol_name is set, a driver should expect to be given only a filename and no other options. Signed-off-by: Fabiano Rosas <farosas@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19block/accounting: introduce latency histogramVladimir Sementsov-Ogievskiy
Introduce latency histogram statics for block devices. For each accounted operation type, the latency region [0, +inf) is divided into subregions by several points. Then, calculate hits for each subregion. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180309165212.97144-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-19block/mirror: change the semantic of 'force' of block-job-cancelLiang Li
When doing drive mirror to a low speed shared storage, if there was heavy BLK IO write workload in VM after the 'ready' event, drive mirror block job can't be canceled immediately, it would keep running until the heavy BLK IO workload stopped in the VM. Libvirt depends on the current block-job-cancel semantics, which is that when used without a flag after the 'ready' event, the command blocks until data is in sync. However, these semantics are awkward in other situations, for example, people may use drive mirror for realtime backups while still wanting to use block live migration. Libvirt cannot start a block live migration while another drive mirror is in progress, but the user would rather abandon the backup attempt as broken and proceed with the live migration than be stuck waiting for the current drive mirror backup to finish. The drive-mirror command already includes a 'force' flag, which libvirt does not use, although it documented the flag as only being useful to quit a job which is paused. However, since quitting a paused job has the same effect as abandoning a backup in a non-paused job (namely, the destination file is not in sync, and the command completes immediately), we can just improve the documentation to make the force flag obviously useful. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: John Snow <jsnow@redhat.com> Reported-by: Huaitong Han <huanhuaitong@didichuxing.com> Signed-off-by: Huaitong Han <huanhuaitong@didichuxing.com> Signed-off-by: Liang Li <liliangleo@didichuxing.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19blockjobs: add block-job-finalizeJohn Snow
Instead of automatically transitioning from PENDING to CONCLUDED, gate the .prepare() and .commit() phases behind an explicit acknowledgement provided by the QMP monitor if auto_finalize = false has been requested. This allows us to perform graph changes in prepare and/or commit so that graph changes do not occur autonomously without knowledge of the controlling management layer. Transactions that have reached the "PENDING" state together can all be moved to invoke their finalization methods by issuing block_job_finalize to any one job in the transaction. Jobs in a transaction with mixed job->auto_finalize settings will all remain stuck in the "PENDING" state, as if the entire transaction was specified with auto_finalize = false. Jobs that specified auto_finalize = true, however, will still not emit the PENDING event. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19blockjobs: add PENDING status and eventJohn Snow
For jobs utilizing the new manual workflow, we intend to prohibit them from modifying the block graph until the management layer provides an explicit ACK via block-job-finalize to move the process forward. To distinguish this runstate from "ready" or "waiting," we add a new "pending" event and status. For now, the transition from PENDING to CONCLUDED/ABORTING is automatic, but a future commit will add the explicit block-job-finalize step. Transitions: Waiting -> Pending: Normal transition. Pending -> Concluded: Normal transition. Pending -> Aborting: Late transactional failures and cancellations. Removed Transitions: Waiting -> Concluded: Jobs must go to PENDING first. Verbs: Cancel: Can be applied to a pending job. +---------+ |UNDEFINED| +--+------+ | +--v----+ +---------+CREATED+-----------------+ | +--+----+ | | | | | +--+----+ +------+ | +---------+RUNNING<----->PAUSED| | | +--+-+--+ +------+ | | | | | | | +------------------+ | | | | | | +--v--+ +-------+ | | +---------+READY<------->STANDBY| | | | +--+--+ +-------+ | | | | | | | +--v----+ | | +---------+WAITING<---------------+ | | +--+----+ | | | | | +--v----+ | +---------+PENDING| | | +--+----+ | | | | +--v-----+ +--v------+ | |ABORTING+--->CONCLUDED| | +--------+ +--+------+ | | | +--v-+ | |NULL<--------------------+ +----+ Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19blockjobs: add prepare callbackJohn Snow
Some jobs upon finalization may need to perform some work that can still fail. If these jobs are part of a transaction, it's important that these callbacks fail the entire transaction. We allow for a new callback in addition to commit/abort/clean that allows us the opportunity to have fairly late-breaking failures in the transactional process. The expected flow is: - All jobs in a transaction converge to the PENDING state, added in a forthcoming commit. - Upon being finalized, either automatically or explicitly by the user, jobs prepare to complete. - If any job fails preparation, all jobs call .abort. - Otherwise, they succeed and call .commit. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19blockjobs: add block_job_dismissJohn Snow
For jobs that have reached their CONCLUDED state, prior to having their last reference put down (meaning jobs that have completed successfully, unsuccessfully, or have been canceled), allow the user to dismiss the job's lingering status report via block-job-dismiss. This gives management APIs the chance to conclusively determine if a job failed or succeeded, even if the event broadcast was missed. Note: block_job_do_dismiss and block_job_decommission happen to do exactly the same thing, but they're called from different semantic contexts, so both aliases are kept to improve readability. Note 2: Don't worry about the 0x04 flag definition for AUTO_DISMISS, she has a friend coming in a future patch to fill the hole where 0x02 is. Verbs: Dismiss: operates on CONCLUDED jobs only. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19blockjobs: add block_job_verb permission tableJohn Snow
Which commands ("verbs") are appropriate for jobs in which state is also somewhat burdensome to keep track of. As of this commit, it looks rather useless, but begins to look more interesting the more states we add to the STM table. A recurring theme is that no verb will apply to an 'undefined' job. Further, it's not presently possible to restrict the "pause" or "resume" verbs any more than they are in this commit because of the asynchronous nature of how jobs enter the PAUSED state; justifications for some seemingly erroneous applications are given below. ===== Verbs ===== Cancel: Any state except undefined. Pause: Any state except undefined; 'created': Requests that the job pauses as it starts. 'running': Normal usage. (PAUSED) 'paused': The job may be paused for internal reasons, but the user may wish to force an indefinite user-pause, so this is allowed. 'ready': Normal usage. (STANDBY) 'standby': Same logic as above. Resume: Any state except undefined; 'created': Will lift a user's pause-on-start request. 'running': Will lift a pause request before it takes effect. 'paused': Normal usage. 'ready': Will lift a pause request before it takes effect. 'standby': Normal usage. Set-speed: Any state except undefined, though ready may not be meaningful. Complete: Only a 'ready' job may accept a complete request. ======= Changes ======= (1) To facilitate "nice" error checking, all five major block-job verb interfaces in blockjob.c now support an errp parameter: - block_job_user_cancel is added as a new interface. - block_job_user_pause gains an errp paramter - block_job_user_resume gains an errp parameter - block_job_set_speed already had an errp parameter. - block_job_complete already had an errp parameter. (2) block-job-pause and block-job-resume will no longer no-op when trying to pause an already paused job, or trying to resume a job that isn't paused. These functions will now report that they did not perform the action requested because it was not possible. iotests have been adjusted to address this new behavior. (3) block-job-complete doesn't worry about checking !block_job_started, because the permission table guards against this. (4) test-bdrv-drain's job implementation needs to announce that it is 'ready' now, in order to be completed. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19blockjobs: add status enumJohn Snow
We're about to add several new states, and booleans are becoming unwieldly and difficult to reason about. It would help to have a more explicit bookkeeping of the state of blockjobs. To this end, add a new "status" field and add our existing states in a redundant manner alongside the bools they are replacing: UNDEFINED: Placeholder, default state. Not currently visible to QMP unless changes occur in the future to allow creating jobs without starting them via QMP. CREATED: replaces !!job->co && paused && !busy RUNNING: replaces effectively (!paused && busy) PAUSED: Nearly redundant with info->paused, which shows pause_count. This reports the actual status of the job, which almost always matches the paused request status. It differs in that it is strictly only true when the job has actually gone dormant. READY: replaces job->ready. STANDBY: Paused, but job->ready is true. New state additions in coming commits will not be quite so redundant: WAITING: Waiting on transaction. This job has finished all the work it can until the transaction converges, fails, or is canceled. PENDING: Pending authorization from user. This job has finished all the work it can until the job or transaction is finalized via block_job_finalize. This implies the transaction has converged and left the WAITING phase. ABORTING: Job has encountered an error condition and is in the process of aborting. CONCLUDED: Job has ceased all operations and has a return code available for query and may be dismissed via block_job_dismiss. NULL: Job has been dismissed and (should) be destroyed. Should never be visible to QMP. Some of these states appear somewhat superfluous, but it helps define the expected flow of a job; so some of the states wind up being synchronous empty transitions. Importantly, jobs can be in only one of these states at any given time, which helps code and external users alike reason about the current condition of a job unambiguously. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19Blockjobs: documentation touchupJohn Snow
Trivial; Document what the job creation flags do, and some general tidying. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-19blockjobs: model single jobs as transactionsJohn Snow
model all independent jobs as single job transactions. It's one less case we have to worry about when we add more states to the transition machine. This way, we can just treat all job lifetimes exactly the same. This helps tighten assertions of the STM graph and removes some conditionals that would have been needed in the coming commits adding a more explicit job lifetime management API. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-16Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into ↵Peter Maydell
staging # gpg: Signature made Tue 13 Mar 2018 21:11:43 GMT # gpg: using RSA key 7DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/bitmaps-pull-request: iotests: add dirty bitmap postcopy test iotests: add dirty bitmap migration test migration: add postcopy migration of dirty bitmaps migration: allow qmp command migrate-start-postcopy for any postcopy migration: add is_active_iterate handler migration/qemu-file: add qemu_put_counted_string() migration: include migrate_dirty_bitmaps in migrate_postcopy qapi: add dirty-bitmaps migration capability migration: introduce postcopy-only pending dirty-bitmap: add locked state block/dirty-bitmap: add _locked version of bdrv_reclaim_dirty_bitmap block/dirty-bitmap: fix locking in bdrv_reclaim_dirty_bitmap block/dirty-bitmap: add bdrv_dirty_bitmap_enable_successor() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-03-13dirty-bitmap: add locked stateVladimir Sementsov-Ogievskiy
Add special state, when qmp operations on the bitmap are disabled. It is needed during bitmap migration. "Frozen" state is not appropriate here, because it looks like bitmap is unchanged. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180207155837.92351-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
2018-03-13block/dirty-bitmap: add _locked version of bdrv_reclaim_dirty_bitmapVladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180207155837.92351-4-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
2018-03-13nbd: BLOCK_STATUS for standard get_block_status function: client partVladimir Sementsov-Ogievskiy
Minimal realization: only one extent in server answer is supported. Flag NBD_CMD_FLAG_REQ_ONE is used to force this behavior. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180312152126.286890-6-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweaks, fix min_block check and 32-bit cap, use -1 instead of errno on failure in nbd_negotiate_simple_meta_context, ensure that block status makes progress on success] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-13block/dirty-bitmap: add bdrv_dirty_bitmap_enable_successor()Vladimir Sementsov-Ogievskiy
Enabling bitmap successor is necessary to enable successors of bitmaps being migrated before target vm start. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180207155837.92351-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
2018-03-13Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into ↵Peter Maydell
staging # gpg: Signature made Mon 12 Mar 2018 16:01:16 GMT # gpg: using RSA key 9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: block: make BDRV_POLL_WHILE() re-entrancy safe Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-03-12block: make BDRV_POLL_WHILE() re-entrancy safeStefan Hajnoczi
Nested BDRV_POLL_WHILE() calls can occur. Currently assert(!wait_->wakeup) fails in AIO_WAIT_WHILE() when this happens. This patch converts the bool wait_->need_kick flag to an unsigned wait_->num_waiters counter. Nesting works correctly because outer AIO_WAIT_WHILE() callers evaluate the condition again after the inner caller completes (invoking the inner caller counts as aio_poll() progress). Reported-by: "fuweiwei (C)" <fuweiwei2@huawei.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180307124619.6218-1-stefanha@redhat.com Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-03-12Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches # gpg: Signature made Fri 09 Mar 2018 15:09:20 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (56 commits) qemu-iotests: fix 203 migration completion race iotests: Tweak 030 in order to trigger a race condition with parallel jobs iotests: Skip test for ENOMEM error iotests: Mark all tests executable iotests: Test creating overlay when guest running qemu-iotests: Test ssh image creation over QMP qemu-iotests: Test qcow2 over file image creation with QMP block: Fail bdrv_truncate() with negative size file-posix: Fix no-op bdrv_truncate() with falloc preallocation ssh: Support .bdrv_co_create ssh: Pass BlockdevOptionsSsh to connect_to_ssh() ssh: QAPIfy host-key-check option ssh: Use QAPI BlockdevOptionsSsh object sheepdog: Support .bdrv_co_create sheepdog: QAPIfy "redundancy" create option nfs: Support .bdrv_co_create nfs: Use QAPI options in nfs_client_open() rbd: Use qemu_rbd_connect() in qemu_rbd_do_create() rbd: Assign s->snap/image_name in qemu_rbd_open() rbd: Support .bdrv_co_create ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-03-09block: x-blockdev-create QMP commandKevin Wolf
This adds a synchronous x-blockdev-create QMP command that can create qcow2 images on a given node name. We don't want to block while creating an image, so this is not the final interface in all aspects, but BlockdevCreateOptionsQcow2 and .bdrv_co_create() are what they actually might look like in the end. In any case, this should be good enough to test whether we interpret BlockdevCreateOptions as we should. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09block: Make bdrv_is_whitelisted() publicKevin Wolf
We'll use a separate source file for image creation, and we need to check there whether the requested driver is whitelisted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09qcow2: Use BlockdevRef in qcow2_co_create()Kevin Wolf
Instead of passing a separate BlockDriverState* into qcow2_co_create(), make use of the BlockdevRef that is included in BlockdevCreateOptions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-03-09block: convert bdrv_check callback to coroutine_fnPaolo Bonzini
Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1516279431-30424-8-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09block: convert bdrv_invalidate_cache callback to coroutine_fnPaolo Bonzini
QED's bdrv_invalidate_cache implementation would like to reuse functions that acquire/release the metadata locks. Call it from coroutine context to simplify the logic. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1516279431-30424-6-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-08block: add aio_wait_bh_oneshot()Stefan Hajnoczi
Sometimes it's necessary for the main loop thread to run a BH in an IOThread and wait for its completion. This primitive is useful during startup/shutdown to synchronize and avoid race conditions. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20180307144205.20619-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-03-06Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches # gpg: Signature made Mon 05 Mar 2018 17:45:51 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (38 commits) block: Fix NULL dereference on empty drive error qcow2: Replace align_offset() with ROUND_UP() block/ssh: Add basic .bdrv_truncate() block/ssh: Make ssh_grow_file() blocking block/ssh: Pull ssh_grow_file() from ssh_create() qemu-img: Make resize error message more general qcow2: make qcow2_co_create2() a coroutine_fn block: rename .bdrv_create() to .bdrv_co_create_opts() Revert "IDE: Do not flush empty CDROM drives" block: test blk_aio_flush() with blk->root == NULL block: add BlockBackend->in_flight counter block: extract AIO_WAIT_WHILE() from BlockDriverState aio: rename aio_context_in_iothread() to in_aio_context_home_thread() docs: document how to use the l2-cache-entry-size parameter specs/qcow2: Fix documentation of the compressed cluster descriptor iotest 033: add misaligned write-zeroes test via truncate block: fix write with zero flag set and iovector provided block: Drop unused .bdrv_co_get_block_status() vvfat: Switch to .bdrv_co_block_status() vpc: Switch to .bdrv_co_block_status() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # include/block/block.h
2018-03-02Include less of the generated modular QAPI headersMarkus Armbruster
In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-02block: rename .bdrv_create() to .bdrv_co_create_opts()Stefan Hajnoczi
BlockDriver->bdrv_create() has been called from coroutine context since commit 5b7e1542cfa41a281af9629d31cef03704d976e6 ("block: make bdrv_create adopt coroutine"). Make this explicit by renaming to .bdrv_co_create_opts() and add the coroutine_fn annotation. This makes it obvious to block driver authors that they may yield, use CoMutex, or other coroutine_fn APIs. bdrv_co_create is reserved for the QAPI-based version that Kevin is working on. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170705102231.20711-2-stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02block: extract AIO_WAIT_WHILE() from BlockDriverStateStefan Hajnoczi
BlockDriverState has the BDRV_POLL_WHILE() macro to wait on event loop activity while a condition evaluates to true. This is used to implement synchronous operations where it acts as a condvar between the IOThread running the operation and the main loop waiting for the operation. It can also be called from the thread that owns the AioContext and in that case it's just a nested event loop. BlockBackend needs this behavior but doesn't always have a BlockDriverState it can use. This patch extracts BDRV_POLL_WHILE() into the AioWait abstraction, which can be used with AioContext and isn't tied to BlockDriverState anymore. This feature could be built directly into AioContext but then all users would kick the event loop even if they signal different conditions. Imagine an AioContext with many BlockDriverStates, each time a request completes any waiter would wake up and re-check their condition. It's nicer to keep a separate AioWait object for each condition instead. Please see "block/aio-wait.h" for details on the API. The name AIO_WAIT_WHILE() avoids the confusion between AIO_POLL_WHILE() and AioContext polling. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02aio: rename aio_context_in_iothread() to in_aio_context_home_thread()Stefan Hajnoczi
The name aio_context_in_iothread() is misleading because it also returns true when called on the main AioContext from the main loop thread, which is not an IOThread. This patch renames it to in_aio_context_home_thread() and expands the doc comment to make the semantics clearer. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02block: Drop unused .bdrv_co_get_block_status()Eric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. Now that all drivers have been updated to provide the byte-based .bdrv_co_block_status(), we can delete the sector-based interface. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02block: Switch passthrough drivers to .bdrv_co_block_status()Eric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. Update the generic helpers, and all passthrough clients (blkdebug, commit, mirror, throttle) accordingly. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02block: Add .bdrv_co_block_status() callbackEric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. Now that the block layer exposes byte-based allocation, it's time to tackle the drivers. Add a new callback that operates on as small as byte boundaries. Subsequent patches will then update individual drivers, then finally remove .bdrv_co_get_block_status(). The new code also passes through the 'want_zero' hint, which will allow subsequent patches to further optimize callers that only care about how much of the image is allocated (want_zero is false), rather than full details about runs of zeroes and which offsets the allocation actually maps to (want_zero is true). As part of this effort, fix another part of the documentation: the claim in commit 4c41cb4 that BDRV_BLOCK_ALLOCATED is short for 'DATA || ZERO' is a lie at the block layer (see commit e88ae2264), even though it is how the bit is computed from the driver layer. After all, there are intentionally cases where we return ZERO but not ALLOCATED at the block layer, when we know that a read sees zero because the backing file is too short. Note that the driver interface is thus slightly different than the public interface with regards to which bits will be set, and what guarantees are provided on input. We also add an assertion that any driver using the new callback will make progress (the only time pnum will be 0 is if the block layer already handled an out-of-bounds request, or if there is an error); the old driver interface did not provide this guarantee, which could lead to some inf-loops in drastic corner-case failures. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-01nbd: BLOCK_STATUS constantsVladimir Sementsov-Ogievskiy
Expose the new constants and structs that will be used by both server and client implementations of NBD_CMD_BLOCK_STATUS (the command is currently experimental at https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md but will hopefully be stabilized soon). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1518702707-7077-4-git-send-email-vsementsov@virtuozzo.com> [eblake: split from larger patch on server implementation] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-01nbd: change indenting in nbd.hVladimir Sementsov-Ogievskiy
Prepared indenting for the following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1518702707-7077-3-git-send-email-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2018-02-13block: maintain persistent disabled bitmapsVladimir Sementsov-Ogievskiy
To maintain load/store disabled bitmap there is new approach: - deprecate @autoload flag of block-dirty-bitmap-add, make it ignored - store enabled bitmaps as "auto" to qcow2 - store disabled bitmaps without "auto" flag to qcow2 - on qcow2 open load "auto" bitmaps as enabled and others as disabled (except in_use bitmaps) Also, adjust iotests 165 and 176 appropriately. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180202160752.143796-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-02-09block: Simplify bdrv_can_write_zeroes_with_unmap()Eric Blake
We don't need the can_write_zeroes_with_unmap field in BlockDriverInfo, because it is redundant information with supported_zero_flags & BDRV_REQ_MAY_UNMAP. Note that BlockDriverInfo and supported_zero_flags are both per-device settings, rather than global state about the driver as a whole, which means one or both of these bits of information can already be conditional. Let's audit how they were set: crypto: always setting can_write_ to false is pointless (the struct starts life zero-initialized), no use of supported_ nbd: just recently fixed to set can_write_ if supported_ includes MAY_UNMAP (thus this commit effectively reverts bca80059e and solves the problem mentioned there in a more global way) file-posix, iscsi, qcow2: can_write_ is conditional, while supported_ was unconditional; but passing MAY_UNMAP would fail with ENOTSUP if the condition wasn't met qed: can_write_ is unconditional, but pwrite_zeroes lacks support for MAY_UNMAP and supported_ is not set. Perhaps support can be added later (since it would be similar to qcow2), but for now claiming false is no real loss all other drivers: can_write_ is not set, and supported_ is either unset or a passthrough Simplify the code by moving the conditional into supported_zero_flags for all drivers, then dropping the now-unused BDI field. For callers that relied on bdrv_can_write_zeroes_with_unmap(), we return the same per-device settings for drivers that had conditions (no observable change in behavior there); and can now return true (instead of false) for drivers that support passthrough (for example, the commit driver) which gives those drivers the same fix as nbd just got in bca80059e. For callers that relied on supported_zero_flags, we now have a few more places that can avoid a wasted call to pwrite_zeroes() that will just fail with ENOTSUP. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180126193439.20219-1-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-02-09Move include qemu/option.h from qemu-common.h to actual usersMarkus Armbruster
qemu-common.h includes qemu/option.h, but most places that include the former don't actually need the latter. Drop the include, and add it to the places that actually need it. While there, drop superfluous includes of both headers, and separate #include from file comment with a blank line. This cleanup makes the number of objects depending on qemu/option.h drop from 4545 (out of 4743) to 284 in my "build everything" tree. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-20-armbru@redhat.com> [Semantic conflict with commit bdd6a90a9e in block/nvme.c resolved]
2018-02-09Include qapi/qmp/qdict.h exactly where neededMarkus Armbruster
This cleanup makes the number of objects depending on qapi/qmp/qdict.h drop from 4550 (out of 4743) to 368 in my "build everything" tree. For qapi/qmp/qobject.h, the number drops from 4552 to 390. While there, separate #include from file comment with a blank line. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-13-armbru@redhat.com>
2018-02-09Include qapi/qmp/qobject.h exactly where neededMarkus Armbruster
Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-11-armbru@redhat.com>
2018-02-09Drop superfluous includes of qapi-types.h and test-qapi-types.hMarkus Armbruster
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-4-armbru@redhat.com>
2018-02-08block: Move NVMe constants to a separate headerFam Zheng
Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-8-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-02-08block: Introduce buf register APIFam Zheng
Allow block driver to map and unmap a buffer for later I/O, as a performance hint. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-5-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-01-26qapi: add nbd-server-removeVladimir Sementsov-Ogievskiy
Add command for removing an export. It is needed for cases when we don't want to keep the export after the operation on it was completed. The other example is a temporary node, created with blockdev-add. If we want to delete it we should firstly remove any corresponding NBD export. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180119135719.24745-3-vsementsov@virtuozzo.com> [eblake: drop dead nb_clients code] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-01-10nbd: rename nbd_option and nbd_opt_replyVladimir Sementsov-Ogievskiy
Rename nbd_option and nbd_opt_reply to NBDOption and NBDOptionReply to correspond to Qemu coding style and other structures here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20171122101958.17065-5-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2017-12-22block: Allow graph changes in subtree drained sectionKevin Wolf
We need to remember how many of the drain sections in which a node is were recursive (i.e. subtree drain rather than node drain), so that they can be correctly applied when children are added or removed during the drained section. With this change, it is safe to modify the graph even inside a bdrv_subtree_drained_begin/end() section. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22block: Add bdrv_subtree_drained_begin/end()Kevin Wolf
bdrv_drained_begin() waits for the completion of requests in the whole subtree, but it only actually keeps its immediate bs parameter quiesced until bdrv_drained_end(). Add a version that keeps the whole subtree drained. As of this commit, graph changes cannot be allowed during a subtree drained section, but this will be fixed soon. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22block: Don't notify parents in drain call chainKevin Wolf
This is in preparation for subtree drains, i.e. drained sections that affect not only a single node, but recursively all child nodes, too. Calling the parent callbacks for drain is pointless when we just came from that parent node recursively and leads to multiple increases of bs->quiesce_counter in a single drain call. Don't do it. In order for this to work correctly, the parent callback must be called for every bdrv_drain_begin/end() call, not only for the outermost one: If we have a node N with two parents A and B, recursive draining of A should cause the quiesce_counter of B to increase because its child N is drained independently of B. If now B is recursively drained, too, A must increase its quiesce_counter because N is drained independently of A only now, even if N is going from quiesce_counter 1 to 2. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-22block: Remove unused bdrv_requests_pendingFam Zheng
Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-12-18hbitmap: add next_zero functionVladimir Sementsov-Ogievskiy
The function searches for next zero bit. Also add interface for BdrvDirtyBitmap and unit test. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20171012135313.227864-2-vsementsov@virtuozzo.com Signed-off-by: Jeff Cody <jcody@redhat.com>