Age | Commit message (Collapse) | Author |
|
Image files have two types of data: immutable data that describes things like
image size, backing files, etc. and mutable data that includes offset and
reference count tables.
Today, image formats aggressively cache mutable data to improve performance. In
some cases, this happens before a guest even starts. When dealing with live
migration, since a file is open on two machines, the caching of meta data can
lead to data corruption.
This patch addresses this by introducing a mechanism to invalidate any cached
mutable data a block driver may have which is then used by the live migration
code.
NB, this still requires coherent shared storage. Addressing migration without
coherent shared storage (i.e. NFS) requires additional work.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|
|
Recent versions of udev always keep the tray locked so that the kernel
can observe "eject request" events (aka tray button presses) even on
discs that aren't mounted. Add support for these events in the ATAPI
and SCSI cd drive device models.
To let management cope with the behavior of udev, an event should also
be added for "tray opened/closed". This way, after issuing an "eject"
command, management can poll until the guests actually reacts to the
command. They can then issue the "change" command after the tray has been
opened, or try with "eject -f" after a (configurable?) timeout. However,
with this patch and the corresponding support in the device models,
at least it is possible to do a manual two-step eject+change sequence.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
|
|
The biggest change is to rename its prefix from BDRV_IOS to
BLOCK_DEVICE_IO_STATUS.
Next commit will convert the query-block command to the QAPI
and that's how the enumeration is going to be generated.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
|
|
A future commit will convert bdrv_info() to the QAPI and it won't
provide IOS_INVAL.
Luckily all we have to do is to add a new 'iostatus_enabled'
member to BlockDriverState and use it instead.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
|
|
This similarly adds support for coroutine and asynchronous discard.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Add coroutine support for flush and apply the same emulation that
we already do for read/write. bdrv_aio_flush is simplified to always
go through a coroutine.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
This commit adds support to the BlockDriverState type to keep track
of devices' I/O status.
There are three possible status: BDRV_IOS_OK (no error), BDRV_IOS_ENOSPC
(no space error) and BDRV_IOS_FAILED (any other error). The distinction
between no space and other errors is important because a management
application may want to watch for no space in order to extend the
space assigned to the VM and put it to run again.
Qemu devices supporting the I/O status feature have to enable it
explicitly by calling bdrv_iostatus_enable() _and_ have to be
configured to stop the VM on errors (ie. werror=stop|enospc or
rerror=stop).
In case of multiple errors being triggered in sequence only the first
one is stored. The I/O status is always reset to BDRV_IOS_OK when the
'cont' command is issued.
Next commits will add support to some devices and extend the
query-block/info block commands to return the I/O status information.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
To let device models distinguish between eject and load.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Device models should be able to set it without an unclean include of
block_int.h.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
It's convenience stuff for block device models, so block.h isn't the
ideal home either, but better than block_int.h.
Permits moving some #include "block_int.h" from device model .h into
.c.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Need to ask the device, so this requires new BlockDevOps member
is_tray_open().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
It's a confused mess (see previous commit). No users remain.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
BlockDriverState member removable is a confused mess. It is true when
an ide-cd, scsi-cd or floppy qdev is attached, or when the
BlockDriverState was created with -drive if={floppy,sd} or -drive
if={ide,scsi,xen,none},media=cdrom ("created removable"), except when
an ide-hd, scsi-hd, scsi-generic or virtio-blk qdev is attached.
Three users remain:
1. eject_device(), via bdrv_is_removable() uses it to determine
whether a block device can eject media.
2. bdrv_info() is monitor command "info block". QMP documentation
says "true if the device is removable, false otherwise". From the
monitor user's point of view, the only sensible interpretation of
"is removable" is "can eject media with monitor commands eject and
change".
A block device can eject media unless a device is attached that
doesn't support it. Switch the two users over to new
bdrv_dev_has_removable_media() that returns exactly that.
3. bdrv_getlength() uses to suppress its length cache when media can
change (see commit 46a4e4e6). Media change is either monitor
command change (updates the length cache), monitor command eject
(doesn't update the length cache, easily fixable), or physical
media change (invalidates length cache, not so easily fixable).
I'm refraining from improving anything here, because this series is
long enough already. Instead, I simply switch it over to
bdrv_dev_has_removable_media() as well.
This changes the behavior of the length cache and of monitor commands
eject and change in two cases:
a. drive not created removable, no device attached
The commit makes the drive removable, and defeats the length cache.
Example: -drive if=none
b. drive created removable, but the attached drive is non-removable,
and doesn't call bdrv_set_removable(..., 0) (most devices don't)
The commit makes the drive non-removable, and enables the length
cache.
Example: -drive if=xen,media=cdrom -M xenpv
The other non-removable devices that don't call
bdrv_set_removable() can't currently use a drive created removable,
either because they aren't qdevified, or because they lack a drive
property. Won't stay that way.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
While there, make the locked parameter bool.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Requires new BlockDevOps member is_medium_locked(). Implement for IDE
and SCSI CD-ROMs.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
The device model knows best when to accept the guest's eject command.
No need to detour through the block layer.
bdrv_eject() can't fail anymore. Make it void.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Device models should be able to use it without an unclean include of
block_int.h.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Multiplexing callbacks complicates matters needlessly.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
So we can more easily add device model callbacks.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
For now, this just protects against programming errors like having the
same drive back multiple non-qdev devices, or untimely bdrv_delete().
Later commits will add other interesting uses.
While there, rename BlockDriverState member peer to dev, bdrv_attach()
to bdrv_attach_dev(), bdrv_detach() to bdrv_detach_dev(), and
bdrv_get_attached() to bdrv_get_attached_dev().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Account the total latency for read/write/flush requests. This allows
management tools to average it based on a snapshot of the nr ops
counters and allow checking for SLAs or provide statistics.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Decouple the I/O accounting from bdrv_aio_readv/writev/flush and
make the hardware models call directly into the accounting helpers.
This means:
- we do not count internal requests from image formats in addition
to guest originating I/O
- we do not double count I/O ops if the device model handles it
chunk wise
- we only account I/O once it actuall is done
- can extent I/O accounting to synchronous or coroutine I/O easily
- implement I/O latency tracking easily (see the next patch)
I've conveted the existing device model callers to the new model,
device models that are using synchronous I/O and weren't accounted
before haven't been updated yet. Also scsi hasn't been converted
to the end-to-end accounting as I want to defer that after the pending
scsi layer overhaul.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
This patch introduces bdrv_parse_cache_flags() which sets open flags
given a cache mode. Previously this was duplicated in blockdev.c and
qemu-img.c.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Add new block driver callbacks bdrv_co_readv/writev, which work on a
QEMUIOVector like bdrv_aio_*, but don't need a callback. The function may only
be called inside a coroutine, so a block driver implementing this interface can
yield instead of blocking during I/O.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
qemu-img.c wants to count allocated file size of image. Previously it
counts a single bs->file by 'stat' or Window API. As VMDK introduces
multiple file support, the operation becomes format specific with
platform specific meanwhile.
The functions are moved to block/raw-{posix,win32}.c and qemu-img.c calls
bdrv_get_allocated_file_size to count the bs. And also added VMDK code
to count his own extents.
Signed-off-by: Fam Zheng <famcool@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Signed-off-by: Devin Nakamura <devin122@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
No users of bdrv_get_type_hint() left. bdrv_set_type_hint() can make
the media removable by side effect. Make that explicit.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
When removing a drive from the host-side via drive_del we currently have
the following path:
drive_del
qemu_aio_flush()
bdrv_close() // zaps bs->drv, which makes any subsequent I/O get
// dropped. Works as designed
drive_uninit()
bdrv_delete() // frees the bs. Since the device is still connected to
// bs, any subsequent I/O is a use-after-free.
The value of bs->drv becomes unpredictable on free. As long as it
remains null, I/O still gets dropped, however it could become non-null
at any point after the free resulting SEGVs or other QEMU state
corruption.
To resolve this issue as simply as possible, we can chose to not
actually delete the BlockDriverState pointer. Since bdrv_close()
handles setting the drv pointer to NULL, we just need to remove the
BlockDriverState from the QLIST that is used to enumerate the block
devices. This is currently handled within bdrv_delete, so move this
into its own function, bdrv_make_anon().
The result is that we can now invoke drive_del, this closes the file
descriptors and sets BlockDriverState->drv to NULL which prevents futher
IO to the device, and since we do not free BlockDriverState, we don't
have to worry about the copy retained in the block devices.
We also don't attempt to remove the qdev property since we are no longer
deleting the BlockDriverState on drives with associated drives. This
also allows for removing Drives with no devices associated either.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Other geometry guessing functions already reside in block.c.
Remove some unused or debugging only fields.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
Certain operations such as drive_del or resize cannot be performed
while external users (eg. block migration) reference the block device.
Add a flag to indicate that.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Extend the change_cb callback with a reason argument, and use it
to tell drivers about size changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Add a new bdrv_discard method to free blocks in a mapping image, and a new
drive property to set the granularity for these discard. If no discard
granularity support is set discard support is disabled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
This patch re-factors img_create() moving the code doing the actual
work into block.c where it can be shared with QEMU. This is needed to
be able to create images from QEMU to be used for live snapshots.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
This changes bdrv_flush to return 0 on success and -errno in case of failure.
It's a requirement for implementing proper error handle in users of bdrv_flush.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
|
|
In order to backup snapshots, created from QCOW2 iamge, we want to copy snapshots out of QCOW2 disk to a seperate storage.
The following patch adds a new option in "qemu-img": qemu-img convert -f qcow2 -O qcow2 -s snapshot_name src_img bck_img.
Right now, it only supports to copy the full snapshot, delta snapshot is on the way.
Changes from V1: all the comments from Kevin are addressed:
Add read-only checking
Fix coding style
Change the name from bdrv_snapshot_load to bdrv_snapshot_load_tmp
Signed-off-by: Disheng Su <edison@cloud.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
BDRV_O_CACHE_MASK should have been extended when cache=unsafe introduced a new
flag BDRV_O_NO_FLUSH. There are currently no users that would change their
behaviour because of this, but let's clean it up before things break.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Block device change command did not copy BDRV_O_SNAPSHOT flag. Thus
the new image did not have this flag and the file got deleted during
opening.
Fix by copying BDRV_O_SNAPSHOT flag.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
People think that their images are corrupted when in fact there are just some
leaked clusters. Differentiating several error cases should make the messages
more comprehensible.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
BlockDriverState member removable controls whether virtual media
change (monitor commands change, eject) is allowed. It is set when
the "type hint" is BDRV_TYPE_CDROM or BDRV_TYPE_FLOPPY.
The type hint is only set by drive_init(). It sets BDRV_TYPE_FLOPPY
for if=floppy. It sets BDRV_TYPE_CDROM for media=cdrom and if=ide,
scsi, xen, or none.
if=ide and if=scsi work, because the type hint makes it a CD-ROM.
if=xen likewise, I think.
For the same reason, if=none works when it's used by ide-drive or
scsi-disk. For other guest devices, there are problems:
* fdc: you can't change virtual media
$ qemu [...] -drive if=none,id=foo,... -global isa-fdc.driveA=foo
QEMU 0.12.50 monitor - type 'help' for more information
(qemu) eject foo
Device 'foo' is not removable
unless you add media=cdrom, but that makes it readonly.
* virtio: if you add media=cdrom, you can change virtual media. If
you eject, the guest gets I/O errors. If you change, the guest sees
the drive's contents suddenly change.
* scsi-generic: if you add media=cdrom, you can change virtual media.
I didn't test what that does to the guest or the physical device,
but it can't be pretty.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
savevm.c keeps a pointer to the snapshot block device. If you manage
to get that device deleted, the pointer dangles, and the next snapshot
operation will crash & burn. Unplugging a guest device that uses it
does the trick:
$ MALLOC_PERTURB_=234 qemu-system-x86_64 [...]
QEMU 0.12.50 monitor - type 'help' for more information
(qemu) info snapshots
No available block device supports snapshots
(qemu) drive_add auto if=none,file=tmp.qcow2
OK
(qemu) device_add usb-storage,id=foo,drive=none1
(qemu) info snapshots
Snapshot devices: none1
Snapshot list (from none1):
ID TAG VM SIZE DATE VM CLOCK
(qemu) device_del foo
(qemu) info snapshots
Snapshot devices:
Segmentation fault (core dumped)
Move management of that pointer to block.c, and zap it when the device
it points becomes unusable.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
For instance, -device scsi-disk,drive=foo -device scsi-disk,drive=foo
happily creates two SCSI disks connected to the same block device.
It's all downhill from there.
Device usb-storage deliberately attaches twice to the same blockdev,
which fails with the fix in place. Detach before the second attach
there.
Also catch attempt to delete while a guest device model is attached.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Add new functions that write and flush the written data to disk immediately.
This is what needs to be used for image format metadata to maintain integrity
for cache=... modes that don't use O_DSYNC. (Actually, we only need barriers,
and therefore the functions are defined as such, but flushes is what is
implemented in this patch - we can try to change that later)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
This is a more flexible alternative to bdrv_iterate().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
do_commit() and mux_proc_byte() iterate over the list of drives
defined with drive_init(). This misses host block devices defined by
other means. Such means don't exist now, but will be introduced later
in this series.
Change them to use new bdrv_commit_all(), which iterates over all host
block devices.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
That's where they belong semantically (block device host part), even
though the actions are actually executed by guest device code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Both bdrv_can_snapshot() and bdrv_has_snapshot() does not work as advertized.
First issue: Their names implies different porpouses, but they do the same thing
and have exactly the same code. Maybe copied and pasted and forgotten?
bdrv_has_snapshot() is called in various places for actually checking if there
is snapshots or not.
Second issue: the way bdrv_can_snapshot() verifies if a block driver supports or
not snapshots does not catch all cases. E.g.: a raw image.
So when do_savevm() is called, first thing it does is to set a global
BlockDriverState to save the VM memory state calling get_bs_snapshots().
static BlockDriverState *get_bs_snapshots(void)
{
BlockDriverState *bs;
DriveInfo *dinfo;
if (bs_snapshots)
return bs_snapshots;
QTAILQ_FOREACH(dinfo, &drives, next) {
bs = dinfo->bdrv;
if (bdrv_can_snapshot(bs))
goto ok;
}
return NULL;
ok:
bs_snapshots = bs;
return bs;
}
bdrv_can_snapshot() may return a BlockDriverState that does not support
snapshots and do_savevm() goes on.
Later on in do_savevm(), we find:
QTAILQ_FOREACH(dinfo, &drives, next) {
bs1 = dinfo->bdrv;
if (bdrv_has_snapshot(bs1)) {
/* Write VM state size only to the image that contains the state */
sn->vm_state_size = (bs == bs1 ? vm_state_size : 0);
ret = bdrv_snapshot_create(bs1, sn);
if (ret < 0) {
monitor_printf(mon, "Error while creating snapshot on '%s'\n",
bdrv_get_device_name(bs1));
}
}
}
bdrv_has_snapshot(bs1) is not checking if the device does support or has
snapshots as explained above. Only in bdrv_snapshot_create() the device is
actually checked for snapshot support.
So, in cases where the first device supports snapshots, and the second does not,
the snapshot on the first will happen anyways. I believe this is not a good
behavior. It should be an all or nothing process.
This patch addresses these issues by making bdrv_can_snapshot() actually do
what it must do and enforces better tests to avoid errors in the middle of
do_savevm(). bdrv_has_snapshot() is removed and replaced by bdrv_can_snapshot()
where appropriate.
bdrv_can_snapshot() was moved from savevm.c to block.c. It makes more sense to me.
The loadvm_state() function was updated too to enforce that when loading a VM at
least all writable devices must support snapshots too.
Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
This patch calls the close handler of the block driver before the qemu
process exits.
This is necessary because the sheepdog block driver releases the lock
of VM images in the close handler.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
C defaults to int, so make definition of BDRV_SECTOR_SIZE 64 bit
safe as it and BDRV_SECTOR_MASK may be used against 64 bit addresses.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|