aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2009-09-12Fix sys-queue.h conflict for goodBlue Swirl
Problem: Our file sys-queue.h is a copy of the BSD file, but there are some additions and it's not entirely compatible. Because of that, there have been conflicts with system headers on BSD systems. Some hacks have been introduced in the commits 15cc9235840a22c289edbe064a9b3c19c5f49896, f40d753718c72693c5f520f0d9899f6e50395e94, 96555a96d724016e13190b28cffa3bc929ac60dc and 3990d09adf4463eca200ad964cc55643c33feb50 but the fixes were fragile. Solution: Avoid the conflict entirely by renaming the functions and the file. Revert the previous hacks. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-09-11block: add aio_flush operationChristoph Hellwig
Instead stalling the VCPU while serving a cache flush try to do it asynchronously. Use our good old helper thread pool to issue an asynchronous fdatasync for raw-posix. Note that while Linux AIO implements a fdatasync operation it is not useful for us because it isn't actually implement in asynchronous fashion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11block: use fdatasync instead of fsync if possibleChristoph Hellwig
If we are flushing the caches for our image files we only care about the data (including the metadata required for accessing it) but not things like timestamp updates. So try to use fdatasync instead of fsync to implement the flush operations. Unfortunately many operating systems still do not support fdatasync, so we add a qemu_fdatasync wrapper that uses fdatasync if available as per the _POSIX_SYNCHRONIZED_IO feature macro or fsync otherwise. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-09qcow2: Order concurrent AIO requests on the same unallocated clusterKevin Wolf
When two AIO requests write to the same cluster, and this cluster is unallocated, currently both requests allocate a new cluster and the second one merges the first one when it is completed. This means an cluster allocation, a read and a cluster deallocation which cause some overhead. If we simply let the second request wait until the first one is done, we improve overall performance with AIO requests (specifially, qcow2/virtio combinations). This patch maintains a list of in-flight requests that have allocated new clusters. A second request touching the same cluster is limited so that it either doesn't touch the allocation of the first request (so it can have a non-overlapping allocation) or it waits for the first request to complete. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-09qcow2: Fix metadata preallocationKevin Wolf
The wrong version of the preallocation patch has been applied, so this is the remaining diff. We can't use truncate to grow the image file to the right size because we don't know if metadata has been written after the last data cluster. In this case truncate would shrink the file and destroy its metadata. Write a zero sector at the end of the virtual disk instead to ensure that the file is big enough. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-09Fix spelling in comment.Stefan Weil
The company which made Virtual PC was Connectix. They use the magic string "conectix" in their disk images. Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-29Fix gcc 3 warning about uninitialized variableBlue Swirl
If nb_sectors is 0, cluster_offset will not be initialized. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-08-28Don't compile aio code if CONFIG_LINUX_AIO is undefinedStefan Weil
This patch fixes linker errors when building QEMU without Linux AIO support. It is based on suggestions from malc and Kevin Wolf. Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27raw-posix: add Linux native AIO supportChristoph Hellwig
Now that do have a nicer interface to work against we can add Linux native AIO support. It's an extremly thing layer just setting up an iocb for the io_submit system call in the submission path, and registering an eventfd with the qemu poll handler to do complete the iocbs directly from there. This started out based on Anthony's earlier AIO patch, but after estimated 42,000 rewrites and just as many build system changes there's not much left of it. To enable native kernel aio use the aio=native sub-command on the drive command line. I have also added an option to qemu-io to test the aio support without needing a guest. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27raw-posix: refactor AIO supportChristoph Hellwig
Currently the raw-posix.c code contains a lot of knowledge about the asynchronous I/O scheme that is mostly implemented in posix-aio-compat.c. All this code does not really belong here and is getting a bit in the way of implementing native AIO on Linux. So instead move all the guts of the AIO implementation into posix-aio-compat.c (which might need a better name, btw). There's now a very small interface between the AIO providers and raw-posix.c: - an init routine is called from raw_open_common to return an AIO context for this drive. An AIO implementation may either re-use one context for all drives, or use a different one for each as the Linux native AIO support will do. - an submit routine is called from the aio_reav/writev methods to submit an AIO request There are no indirect calls involved in this interface as we need to decide which one to call manually. We will only call the Linux AIO native init function if we were requested to by vl.c, and we will only call the native submit function if we are asked to and the request is properly aligned. That's also the reason why the alignment check actually does the inverse move and now goes into raw-posix.c. The old posix-aio-compat.h headers is removed now that most of it's content is private to posix-aio-compat.c, and instead we add a new block/raw-posix-aio.h headers is created containing only the tiny interface between raw-posix.c and the AIO implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27qcow2: Metadata preallocationKevin Wolf
This introduces a qemu-img create option for qcow2 which allows the metadata to be preallocated, i.e. clusters are reserved in the refcount table and L1/L2 tables, but no data is written to them. Metadata is quite small, so this happens in almost no time. Especially with qcow2 on virtio this helps to gain a bit of performance during the initial writes. However, as soon as create a snapshot, we're back to the normal slow speed, obviously. So this isn't the real fix, but kind of a cheat while we're still having trouble with qcow2 on virtio. Note that the option is disabled by default and needs to be specified explicitly using qemu-img create -f qcow2 -o preallocation=metadata. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27block/vdi.c: Fix several bugsStefan Weil
* The code for option '-static' was wrong, so image creation always created static images. * Static images created with qemu-img did not set header entry blocks_allocated. * The size of the block map must be rounded to the next multiple of SECTOR_SIZE, otherwise the block map is only read partially for block map sizes which are not a multiple of SECTOR_SIZE. Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-24eliminate errors about unused results in block/vpc.cNathan Froyd
These errors come up when compiling with gcc-4.3.3 and some older headers: /scratch/froydnj/qemu.git/block/vpc.c: In function 'vpc_create': /scratch/froydnj/qemu.git/block/vpc.c:514: error: value computed is not used /scratch/froydnj/qemu.git/block/vpc.c:516: error: value computed is not used /scratch/froydnj/qemu.git/block/vpc.c:517: error: value computed is not used /scratch/froydnj/qemu.git/block/vpc.c:566: error: value computed is not used Use memcpy to copy the strings instead of strncpy. Signed-off-by: Nathan Froyd <froydnj@codesourcery.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-24make pthreads mandatoryChristoph Hellwig
As requested by Anthony make pthreads mandatory. This means we will always have AIO available on posix hosts, and it will also allow enabling the I/O thread unconditionally once it's ready. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-15Use pstrcpy to avoid OpenBSD linker warningsBlue Swirl
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-08-10Add new block driver for the VDI format (only aio supported)Stefan Weil
This is a new block driver written from scratch to support the VDI format in QEMU. VDI is the native format used by Innotek / SUN VirtualBox. Latest changes: * stripped down version (code for synchronous operations and experimental code removed) * don't open VDI snapshot images (with uuid_link or uuid_parent) * modified vdi_aio_cancel Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Message-Id:
2009-08-01Fix Sparse warning about "expression using sizeof on a function"Blue Swirl
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-07-27rename HOST_BSD to CONFIG_BSDJuan Quintela
Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-22vmdk: Fix backing file handlingKevin Wolf
Instead of storing the backing file in its own BlockDriverState, VMDK uses the BlockDriverState of the raw image file it opened. This is wrong and breaks functions that access the backing file or protocols. This fix replaces all occurrences of s->hd->backing_* with bs->backing_*. This fixes qemu-iotests failure in 020 (Commit changes to backing file). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-20Fix most warnings (errors with -Werror) when debugging is enabledBlue Swirl
I used the following command to enable debugging: perl -p -i -e 's/^\/\/#define DEBUG/#define DEBUG/g' * */* */*/* Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-07-16raw-posix: Handle errors in raw_createStefan Weil
In qemu-iotests, some large images are created using qemu-img. Without checks for errors, qemu-img will just create an empty image, and later read / write tests will fail. With the patch, failures during image creation are detected and reported. Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-16replace bdrv_{get, put}_buffer with bdrv_{load, save}_vmstateChristoph Hellwig
The VM state offset is a concept internal to the image format. Replace the old bdrv_{get,put}_buffer method that require an index into the image file that is constructed from the VM state offset and an offset into the vmstate with the bdrv_{load,save}_vmstate that just take an offset into the VM state. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-10qcow2: Fix L1 table memory allocationKevin Wolf
Contrary to what one could expect, the size of L1 tables is not cluster aligned. So as we're writing whole sectors now instead of single entries, we need to ensure that the L1 table in memory is large enough; otherwise write would access memory after the end of the L1 table. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-10qcow1: Fix qcow_aio_writevKevin Wolf
Pass is_write = 1 to qcow_aio_setup when writing. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09Substitute O_DSYNC with O_SYNC or O_FSYNC when needed.G 3
Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09Allow adjustment of http block device's readahead size, via a newNolan
":readahead=###:" suffix. Signed-off-by: Nolan Leake <nolan <at> sigbus.net> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09Revert "support colon in filenames"Anthony Liguori
This reverts commit 707c0dbc97cddfe8d2441b8259c6c526d99f2dd8. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09qcow2: Make cache=writethrough defaultKevin Wolf
The performance of qcow2 has improved meanwhile, so we don't need to special-case it any more. Switch the default to write-through caching like all other block drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-29qcow2: Cache refcount blocks during snapshot creationKevin Wolf
The really time consuming part of snapshotting is to adjust the reference count of all clusters. Currently after each adjusted cluster the refcount block is written to disk. Don't write each single byte immediately to disk but cache all writes to the refcount block and write them out once we're done with the block. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-29block-raw: Allow pread beyond the end of growable imagesKevin Wolf
When using O_DIRECT, qcow2 snapshots didn't work any more for me. In the process of creating the snapshot, qcow2 tries to pwrite some new information (e.g. new L1 table) which will often end up being after the old end of the image file. Now pwrite tries to align things and reads the old contents of the file, read returns 0 because there is nothing to read after the end of file and pwrite is stuck in an endless loop. This patch allows to pread beyond the end of an image file. Whenever the given offset is after the end of the image file, the read succeeds and fills the buffer with zeros. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-29support colon in filenamesRam Pai
Problem: It is impossible to feed filenames with the character colon because qemu interprets such names as a protocol. For example filename scsi:0, is interpreted as a protocol by name "scsi". This patch allows user to espace colon characters. For example the above filename can now be expressed either as 'scsi\:0' or as file:scsi:0 anything following the "file:" tag is interpreted verbatin. However if "file:" tag is omitted then any colon characters in the string must be escaped using backslash. Here are couple of examples: scsi\:0\:abc is a local file scsi:0:abc http\://myweb is a local file by name http://myweb file:scsi:0:abc is a local file scsi:0:abc file:http://myweb is a local file by name http://myweb Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-29Fix QCOW2 debugging code to compile againFilip Navara
Updated to use C99 comments. Signed-off-by: Filip Navara <filip.navara@gmail.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-17Fix opening of read only raw imagesBlue Swirl
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-06-16update_refcount: Write complete sectorsKevin Wolf
When updating the refcount blocks in update_refcount(), write complete sectors instead of updating single entries. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16alloc_cluster_link_l2: Write complete sectorsKevin Wolf
When updating the L2 tables in alloc_cluster_link_l2(), write complete sectors instead of updating single entries. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16l2_allocate: Write complete sectorsKevin Wolf
When modifying the L1 table, l2_allocate() needs to write complete sectors instead of single entries. The L1 table is already in memory, reading it from disk in the block layer to align the request is wasted performance. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16qcow2: Rename global functionsKevin Wolf
The qcow2 source is now split into several more manageable files. During the conversion quite some functions that were static before needed to be changed to be global to make the source compile again. We were lucky enough not to get name conflicts with these additional global names, but they are not nice. This patch adds a qcow2_ prefix to all of the global functions in qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16qcow2: Split out snapshot functionsKevin Wolf
qcow2-snapshot.c contains the code related to snapshotting. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16qcow2: Split out guest cluster functionsKevin Wolf
qcow2-cluster.c contains all functions related to the management of guest clusters, i.e. what the guest sees on its virtual disk. This code is about mapping these guest clusters to host clusters in the image file using the two-level lookup tables. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16qcow2: Split out refcount handlingKevin Wolf
qcow2-refcount.c contains all functions which are related to cluster allocation and management in the image file. A large part of this is the reference counting of these clusters. Also a header file qcow2.h is introduced which will contain the interface of the split qcow2 modules. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16qcow2: Change default cluster size to 64kKevin Wolf
Larger cluster sizes mean less metadata. This has been discussion a few times, let's do it now. This turns 64k clusters on by default for new images. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16raw-posix: Remove O_RDWR when attempting to open a file read-onlyAvi Kivity
When we open a file, we first attempt to open it read-write, then fall back to read-only. Unfortunately we reuse the flags from the previous attempt, so both attempts try to open the file with write permissions, and fail. Fix by clearing the O_RDWR flag from the previous attempt. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16raw-posix: open flags use BDRV_ namespace, not posix namespaceAvi Kivity
The flags argument to raw_common_open() contain bits defined by the BDRV_O_* namespace, not the posix O_* namespace. Adjust to use the correct constants. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-15raw-posix: cleanup ioctl methodsChristoph Hellwig
Rename raw_ioctl and raw_aio_ioctl to hdev_ioctl and hdev_aio_ioctl as they are only used for the host device. Also only add them to the method table for the cases where we need them (generic hdev if linux and linux CDROM) instead of declaring stubs and always add them. Signed-off-by: Christoph Hellwig <hch@lst.de>
2009-06-15block: add bdrv_probe_device methodChristoph Hellwig
Add a bdrv_probe_device method to all BlockDriver instances implementing host devices to move matching of host device types into the actual drivers. For now we keep exacly the old matching behaviour based on the devices names, although we really should have better detetion methods based on device information in the future. Signed-off-by: Christoph Hellwig <hch@lst.de>
2009-06-15raw-posix: split hdev driversChristoph Hellwig
Instead of declaring one BlockDriver for all host devices declared one for each type: a generic one for normal disk devices, a Linux floppy driver and a CDROM driver for Linux and FreeBSD. This gets rid of a lot of messy ifdefs and switching based on the type in the various removal device methods. block.c grows a new method to find the correct host device driver based on OS-sepcific criteria, which will later into the actual drivers in a later patch in this series. Signed-off-by: Christoph Hellwig <hch@lst.de>
2009-06-15raw-posix: add a raw_open_common helperChristoph Hellwig
raw_open and hdev_open contain the same basic logic. Add a new raw_open_common helper containing the guts of the open routine and call it from raw_open and hdev_open. We use the new open_flags field in BDRVRawState to allow passing additional open flags to raw_open_common from both. Signed-off-by: Christoph Hellwig <hch@lst.de>
2009-06-15raw-posix: always store open flagsChristoph Hellwig
Both the Linux floppy and the FreeBSD CDROM host device need to store the open flags so that they can re-open the device later. Store the open flags unconditionally to remove the ifdef mess and simply the calling conventions for the later patches in the series. Signed-off-by: Christoph Hellwig <hch@lst.de>
2009-06-06qemu-img: Print available options with -o ?Kevin Wolf
This patch adds a small help text to each of the options in the block drivers which can be displayed by using qemu-img create -f fmt -o ? Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2009-05-28vvfat: one more missing BlockDriver C99 initializer conversionChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>