aboutsummaryrefslogtreecommitdiff
path: root/block.c
AgeCommit message (Collapse)Author
2013-01-15block: clear dirty bitmap when discardingPaolo Bonzini
Note that resetting bits in the dirty bitmap is done _before_ actually processing the request. Writes, instead, set bits after the request is completed. This way, when there are concurrent write and discard requests, the outcome will always be that the blocks are marked dirty. This scenario should never happen, but it is safer to do it this way. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-15block: fix initialization in bdrv_io_limits_enable()Peter Lieven
bdrv_io_limits_enable() starts a new slice, but does not set io_base correctly for that slice. Here is how io_base is used: bytes_base = bs->nr_bytes[is_write] - bs->io_base.bytes[is_write]; bytes_res = (unsigned) nb_sectors * BDRV_SECTOR_SIZE; if (bytes_base + bytes_res <= bytes_limit) { /* no wait */ } else { /* operation needs to be throttled */ } As a result, any I/O operations that are triggered between now and bs->slice_end are incorrectly limited. If 10 MB of data has been written since the VM was started, QEMU thinks that 10 MB of data has been written in this slice. This leads to a I/O lockup in the guest. We fix this by delaying the start of a new slice to the next call of bdrv_exceed_io_limits(). Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-01-14block: make qiov_is_aligned() publicStefan Hajnoczi
The qiov_is_aligned() function checks whether a QEMUIOVector meets a BlockDriverState's alignment requirements. This is needed by virtio-blk-data-plane so: 1. Move the function from block/raw-posix.c to block/block.c. 2. Make it public in block/block.h. 3. Rename to bdrv_qiov_is_aligned(). 4. Change return type from int to bool. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-14block: do not probe zero-sized disksPaolo Bonzini
A blank CD or DVD is visible as a zero-sized disks. Probing such disks will lead to an EIO and a failure to start the VM. Treating them as raw is a better solution. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-01-11Replace remaining gmtime, localtime by gmtime_r, localtime_rStefan Weil
This allows removing of MinGW specific code and improves reentrancy for POSIX hosts. [Removed unused ret variable in qemu_get_timedate() to fix warning: vl.c: In function ‘qemu_get_timedate’: vl.c:451:16: error: variable ‘ret’ set but not used [-Werror=unused-but-set-variable] -- Stefan Hajnoczi] Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-19softmmu: move include files to include/sysemu/Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19misc: move include files to include/qemu/Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19monitor: move include files to include/monitor/Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19block: move include files to include/block/Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19qapi: move include files to include/qobject/Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-12qemu-io: Add AIO debugging commandsKevin Wolf
This makes the blkdebug suspend/resume functionality available in qemu-io. Use it like this: $ ./qemu-io blkdebug::/tmp/test.qcow2 qemu-io> break write_aio req_a qemu-io> aio_write 0 4k qemu-io> blkdebug: Suspended request 'req_a' qemu-io> resume req_a blkdebug: Resuming request 'req_a' qemu-io> wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0:00:30.71 (133.359788 bytes/sec and 0.0326 ops/sec) Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11block: bdrv_img_create(): drop unused error handling codeLuiz Capitulino
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11block: bdrv_img_create(): add Error ** argumentLuiz Capitulino
This commit adds an Error ** argument to bdrv_img_create() and set it appropriately on error. Callers of bdrv_img_create() pass NULL for the new argument and still rely on bdrv_img_create()'s return value. Next commits will change callers to use the Error object instead. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11block: Avoid second open for format probingKevin Wolf
This fixes problems that are caused by the additional open/close cycle of the existing format probing, for example related to qemu-nbd without -t option or file descriptor passing. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11block: Factor out bdrv_open_flagsKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-11block: Improve bdrv_aio_co_cancel_emKevin Wolf
Instead of waiting for all requests to complete, wait just for the specific request that should be cancelled. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-11-24block: Fix regression for MinGW (assertion caused by short string)Stefan Weil
The local string tmp_filename is passed to function get_tmp_filename which expects a string with minimum size MAX_PATH for w32 hosts. MAX_PATH is 260 and PATH_MAX is 259, so tmp_filename was too short. Commit eba25057b9a5e19d10ace2bc7716667a31297169 introduced this regression. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-14aio: rename AIOPool to AIOCBInfoStefan Hajnoczi
Now that AIOPool no longer keeps a freelist, it isn't really a "pool" anymore. Rename it to AIOCBInfo and make it const since it no longer needs to be modified. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-11-14aio: use g_slice_alloc() for AIOCB poolingStefan Hajnoczi
AIO control blocks are frequently acquired and released because each aio request involves at least one AIOCB. Therefore, we pool them to avoid heap allocation overhead. The problem with the freelist approach in AIOPool is thread-safety. If we want BlockDriverStates to associate with AioContexts that execute in multiple threads, then a global freelist becomes a problem. This patch drops the freelist and instead uses g_slice_alloc() which is tuned for per-thread fixed-size object pools. qemu_aio_get() and qemu_aio_release() are now thread-safe. Note that the change from g_malloc0() to g_slice_alloc() should be safe since the freelist reuse case doesn't zero the AIOCB either. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-29Merge remote-tracking branch 'kwolf/for-anthony' into stagingAnthony Liguori
* kwolf/for-anthony: (32 commits) osdep: Less restrictive F_SEFL in qemu_dup_flags() qemu-iotests: add testcases for mirroring on-source-error/on-target-error qmp: add pull_event function mirror: add support for on-source-error/on-target-error iostatus: forward block_job_iostatus_reset to block job qemu-iotests: add mirroring test case mirror: implement completion qmp: add drive-mirror command mirror: introduce mirror job block: introduce BLOCK_JOB_READY event block: add block-job-complete block: rename block_job_complete to block_job_completed block: export dirty bitmap information in query-block block: introduce new dirty bitmap functionality block: add bdrv_open_backing_file block: add bdrv_query_stats block: add bdrv_query_info qemu-config: Add new -add-fd command line option monitor: Prevent removing fd from set during init monitor: Enable adding an inherited fd to an fd set ... Conflicts: vl.c Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-10-24iostatus: forward block_job_iostatus_reset to block jobPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24block: export dirty bitmap information in query-blockPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24block: introduce new dirty bitmap functionalityPaolo Bonzini
Assert that write_compressed is never used with the dirty bitmap. Setting the bits early is wrong, because a coroutine might concurrently examine them and copy incomplete data from the source. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24block: add bdrv_open_backing_filePaolo Bonzini
Mirroring runs without the backing file so that it can be copied outside QEMU. However, we need to add it at the time the job is completed and QEMU switches to the target. Factor out the common bits of opening an image and completing a mirroring operation. The new function does not assume that the file is closed immediately after it returns failure, so it keeps the BDRV_O_NO_BACKING flag up-to-date. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24block: add bdrv_query_statsPaolo Bonzini
qmp_query_blockstat cannot have errors, remove the Error argument and create a new public function bdrv_query_stats out of it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24block: add bdrv_query_infoPaolo Bonzini
Extract it out of the implementation of "info block". Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24block: bdrv_create(): don't leak cco.filename on errorLuiz Capitulino
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-24block: make bdrv_find_backing_image compare canonical filenamesJeff Cody
Currently, bdrv_find_backing_image compares bs->backing_file with what is passed in as a backing_file name. Mismatches may occur, however, when bs->backing_file and backing_file are not both absolute or relative. Use path_combine() to make sure any relative backing filenames are relative to the current image filename being searched, and then use realpath() to make all comparisons based on absolute filenames. If either backing_file or bs->backing_file is determine to be a protocol, then no filename normalization is performed. This also changes bdrv_find_backing_image to no longer be recursive, but iterative. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-10-23block: add close notifiersPaolo Bonzini
The first user of close notifiers will be the embedded NBD server. It would be possible to use them to do some of the ad hoc processing (e.g. for block jobs and I/O limits) that is currently done by bdrv_close. Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-23block: prepare code for adding block notifiersPaolo Bonzini
There is no reason in principle to skip job cancellation and draining of pending I/O when there is no medium in the disk. Do these unconditionally, which also prepares the code for the next patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-05block: avoid buffer overrun by using pstrcpy, not strncpyJim Meyering
Also, use PATH_MAX, rather than the arbitrary 1024. Using PATH_MAX is more consistent with other filename-related variables in this file, like backing_filename and tmp_filename. Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-09-28block: introduce block job errorPaolo Bonzini
The following behaviors are possible: 'report': The behavior is the same as in 1.1. An I/O error, respectively during a read or a write, will complete the job immediately with an error code. 'ignore': An I/O error, respectively during a read or a write, will be ignored. For streaming, the job will complete with an error and the backing file will be left in place. For mirroring, the sector will be marked again as dirty and re-examined later. 'stop': The job will be paused and the job iostatus will be set to failed or nospace, while the VM will keep running. This can only be specified if the block device has rerror=stop and werror=stop or enospc. 'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others. In all cases, even for 'report', the I/O error is reported as a QMP event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR. It is possible that while stopping the VM a BLOCK_IO_ERROR event will be reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa. This is not really avoidable since stopping the VM completes all pending I/O requests. In fact, it is already possible now that a series of BLOCK_IO_ERROR events are reported with rerror=stop, because vm_stop calls bdrv_drain_all and this can generate further errors. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28iostatus: reorganize io error codePaolo Bonzini
Move the common part of IDE/SCSI/virtio error handling to the block layer. The new function bdrv_error_action subsumes all three of bdrv_emit_qmp_error_event, vm_stop, bdrv_iostatus_set_err. The same scheme will be used for errors in block jobs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28iostatus: change is_read to a boolPaolo Bonzini
Do this while we are touching this part of the code, before introducing more uses of "int is_read". Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28iostatus: move BlockdevOnError declaration to QAPIPaolo Bonzini
This will let block-stream reuse the enum. Places that used the enums are renamed accordingly. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28iostatus: rename BlockErrorAction, BlockQMPEventActionPaolo Bonzini
We want to remove knowledge of BLOCK_ERR_STOP_ENOSPC from drivers; drivers should only be told whether to stop/report/ignore the error. On the other hand, we want to keep using the nicer BlockErrorAction name in the drivers. So rename the enums, while leaving aside the names of the enum values for now. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28block: move job APIs to separate filesPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28block: helper function, to find the base image of a chainJeff Cody
This is a simple helper function, that will return the base image of a given image chain. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28block: add support functions for live commit, to find and delete images.Jeff Cody
Add bdrv_find_overlay(), and bdrv_drop_intermediate(). bdrv_find_overlay(): given 'bs' and the active (topmost) BDS of an image chain, find the image that is the immediate top of 'bs' bdrv_drop_intermediate(): Given 3 BDS (active, top, base), drop images above base up to and including top, and set base to be the backing file of top's overlay node. E.g., this converts: bottom <- base <- intermediate <- top <- active to bottom <- base <- active Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24block: remove keep_read_only flag from BlockDriverState structJeff Cody
The keep_read_only flag is no longer used, in favor of the bdrv flag BDRV_O_ALLOW_RDWR. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24block: convert bdrv_commit() to use bdrv_reopen()Jeff Cody
Currently, bdrv_commit() reopens images r/w itself, via risky _delete() and _open() calls. Use the new safe method for drive reopen. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24block: Framework for reopening files safelyJeff Cody
This is based on Supriya Kannery's bdrv_reopen() patch series. This provides a transactional method to reopen multiple images files safely. Image files are queue for reopen via bdrv_reopen_queue(), and the reopen occurs when bdrv_reopen_multiple() is called. Changes are staged in bdrv_reopen_prepare() and in the equivalent driver level functions. If any of the staged images fails a prepare, then all of the images left untouched, and the staged changes for each image abandoned. Block drivers are passed a reopen state structure, that contains: * BDS to reopen * flags for the reopen * opaque pointer for any driver-specific data that needs to be persistent from _prepare to _commit/_abort * reopen queue pointer, if the driver needs to queue additional BDS for a reopen Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24block: make bdrv_set_enable_write_cache() modify open_flagsJeff Cody
bdrv_set_enable_write_cache() sets the bs->enable_write_cache flag, but without the flag recorded in bs->open_flags, then next time a reopen() is performed the enable_write_cache setting may be inadvertently lost. This will set the flag in open_flags, so it is preserved across reopens. Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24block: correctly set the keep_read_only flagJeff Cody
I believe the bs->keep_read_only flag is supposed to reflect the initial open state of the device. If the device is initially opened R/O, then commit operations, or reopen operations changing to R/W, are prohibited. Currently, the keep_read_only flag is only accurate for the active layer, and its backing file. Subsequent images end up always having the keep_read_only flag set. For instance, what happens now: [ base ] kro = 1, ro = 1 | v [ snap-1 ] kro = 1, ro = 1 | v [ snap-2 ] kro = 0, ro = 1 | v [ active ] kro = 0, ro = 0 What we want: [ base ] kro = 0, ro = 1 | v [ snap-1 ] kro = 0, ro = 1 | v [ snap-2 ] kro = 0, ro = 1 | v [ active ] kro = 0, ro = 0 Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-12block: Don't forget to delete temporary fileDunrong Huang
The caller would not delete temporary file after failed get_tmp_filename(). Signed-off-by: Dunrong Huang <riegamaths@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-12block: fix block tray statusPavel Hrdina
The tray status should change also if you eject empty block device. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-08-15block: Flush parent to OS with cache=unsafeKevin Wolf
Commit 29cdb251 already added a comment that no unnecessary flushes to disk will occur, this patch makes the code even get to the point of the comment. This is mostly theoretical because in practice we only stack one format on top of one protocol, the former implementing flush_to_os and the latter only flush_to_disk. It starts to matter when drivers that are not on top implement flush_to_os. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-08-13qmp: query-block: add 'encryption_key_missing' fieldLuiz Capitulino
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
2012-08-03block: Use bdrv_get_backing_file_depth()Benoît Canet
Use the dedicated counting function in qmp_query_block in order to propagate the backing file depth to HMP and add backing_file_depth to qmp-commands.hx Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-08-03block: create bdrv_get_backing_file_depth()Benoît Canet
Create bdrv_get_backing_file_depth() in order to be able to show in QMP and HMP how many ancestors backing an image a block device have. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>