aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-12-05block: core copy-on-read logicStefan Hajnoczi
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05block: request overlap detectionStefan Hajnoczi
Detect overlapping requests and remember to align to cluster boundaries if the image format uses them. This assumes that allocating I/O is performed in cluster granularity - which is true for qcow2, qed, etc. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05block: wait for overlapping requestsStefan Hajnoczi
When copy-on-read is enabled it is necessary to wait for overlapping requests before issuing new requests. This prevents races between the copy-on-read and a write request. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05block: add interface to toggle copy-on-readStefan Hajnoczi
The bdrv_enable_copy_on_read()/bdrv_disable_copy_on_read() functions can be used to programmatically enable or disable copy-on-read for a block device. Later patches add the actual copy-on-read logic. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05block: add request trackingStefan Hajnoczi
The block layer does not know about pending requests. This information is necessary for copy-on-read since overlapping requests must be serialized to prevent races that corrupt the image. The BlockDriverState gets a new tracked_request list field which contains all pending requests. Each request is a BdrvTrackedRequest record with sector_num, nb_sectors, and is_write fields. Note that request tracking is always enabled but hopefully this extra work is so small that it doesn't justify adding an enable/disable flag. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05coroutine: add qemu_co_queue_restart_all()Stefan Hajnoczi
It's common to wake up all waiting coroutines. Introduce the qemu_co_queue_restart_all() function to do this instead of looping over qemu_co_queue_next() in every caller. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05qemu-common: add QEMU_ALIGN_DOWN() and QEMU_ALIGN_UP() macrosStefan Hajnoczi
Add macros for aligning a number to a multiple, for example: QEMU_ALIGN_DOWN(500, 2000) = 0 QEMU_ALIGN_UP(500, 2000) = 2000 Since ALIGN_UP() is a common macro name use the QEMU_* namespace prefix. Hopefully this will protect us from included headers that leak something with a similar name. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05block: add bdrv_co_is_allocated() interfaceStefan Hajnoczi
This patch introduces the public bdrv_co_is_allocated() interface which can be used to query image allocation status while the VM is running. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05block: drop .bdrv_is_allocated() interfaceStefan Hajnoczi
Now that all block drivers have been converted to .bdrv_co_is_allocated() we can drop .bdrv_is_allocated(). Note that the public bdrv_is_allocated() interface is still available but is in fact a synchronous wrapper around .bdrv_co_is_allocated(). Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05cow: convert to .bdrv_co_is_allocated()Stefan Hajnoczi
The cow block driver does not keep internal state for cluster lookups. This means it is safe to perform cluster lookups in coroutine context without risk of race conditions that corrupt internal state. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05vdi: convert to .bdrv_co_is_allocated()Stefan Hajnoczi
It is trivial to switch from the synchronous .bdrv_is_allocated() interface to .bdrv_co_is_allocated() since vdi_is_allocated() does not block. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05vvfat: convert to .bdrv_co_is_allocated()Stefan Hajnoczi
It is trivial to switch from the synchronous .bdrv_is_allocated() interface to .bdrv_co_is_allocated() since vvfat_is_allocated() does not block. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05block: convert qcow2, qcow2, and vmdk to .bdrv_co_is_allocated()Stefan Hajnoczi
The qcow2, qcow, and vmdk block drivers are based on coroutines. They have a coroutine mutex which protects internal state. We can convert the .bdrv_is_allocated() function to .bdrv_co_is_allocated() by holding the mutex around the cluster lookup operation. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05qed: convert to .bdrv_co_is_allocated()Stefan Hajnoczi
The bdrv_qed_is_allocated() function is a synchronous wrapper around qed_find_cluster(), which performs the cluster lookup. In order to convert the synchronous function to a coroutine function we yield instead of using qemu_aio_wait(). Note that QED's cache is already safe for parallel requests so no locking is needed. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05block: add .bdrv_co_is_allocated()Stefan Hajnoczi
This patch adds the .bdrv_co_is_allocated() interface which is identical to .bdrv_is_allocated() but runs in coroutine context. Running in coroutine context implies that other coroutines might be performing I/O at the same time. Therefore it must be safe to run while the following BlockDriver functions are in-flight: .bdrv_co_readv() .bdrv_co_writev() .bdrv_co_flush() .bdrv_co_is_allocated() The new .bdrv_co_is_allocated() interface is useful because it can be used when a VM is running, whereas .bdrv_is_allocated() is a synchronous interface that does not cope with parallel requests. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05block: use public bdrv_is_allocated() interfaceStefan Hajnoczi
There is no need for bdrv_commit() to use the BlockDriver .bdrv_is_allocated() interface directly. Converting to the public interface gives us the freedom to drop .bdrv_is_allocated() entirely in favor of a new .bdrv_co_is_allocated() in the future. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05qcow2: Fix error path in qcow2_snapshot_load_tmpKevin Wolf
If the bdrv_read() of the snapshot's L1 table fails, return the right error code and make sure that the old L1 table is still loaded and we don't break the BlockDriverState completely. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05qcow2: Fix order in qcow2_snapshot_deleteKevin Wolf
First the snapshot must be deleted and only then the refcounts can be decreased. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05qcow2: Fix order of refcount updates in qcow2_snapshot_gotoKevin Wolf
The refcount updates must be moved so that in the worst case we can get cluster leaks, but refcounts may never be too low. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05qcow2: Return real error in qcow2_snapshot_gotoKevin Wolf
Besides fixing the return code, this adds some comments that make clear how the code works and that it potentially breaks images if we fail in the wrong place. Actually fixing this is left for the next patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05qcow2: Rework qcow2_snapshot_create error handlingKevin Wolf
Increase refcounts only after allocating a new L1 table has succeeded in order to make leaks less likely. If writing the snapshot table fails, revert in-memory state to be consistent with that on disk. While at it, make it return the real error codes instead of -1. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05qcow2: Cleanups and memleak fix in qcow2_snapshot_createKevin Wolf
sn->id_str could be leaked before this. The rest of this patch changes comments, fixes coding style or removes checks that are unnecessary with g_malloc. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05qcow2: Update snapshot table information at onceKevin Wolf
Failing in the middle wouldn't help with the integrity of the image, so doing everything in a single request seems better. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05qcow2: Return real error code in qcow2_write_snapshotsKevin Wolf
Doesn't immediately fix anything as the callers don't use the return value, but they will be fixed next. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05qcow2: Return real error code in qcow2_read_snapshotsKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-12-05block: Add coroutine_fn marker to coroutine functionsDong Xu Wang
Looks better when reviewing these source files. Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05hmp/qmp: add block_set_io_throttleZhi Yong Wu
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05block: add I/O throttling algorithmZhi Yong Wu
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05CoQueue: introduce qemu_co_queue_wait_insert_headZhi Yong Wu
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05block: add the blockio limits command line supportZhi Yong Wu
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05block: Use bdrv functions to replace file operation in cow.cLi Zhi Hui
Since common file operation functions lack of error detection, so change them to bdrv series functions. Signed-off-by: Li Zhi Hui <zhihuili@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05xen_disk: remove dead codePaolo Bonzini
Xen_disk.c has support for using synchronous I/O instead of asynchronous, but it is compiled out by default. Remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05qed: adjust the way to get nb_sectorsZhi Yong Wu
This patch is only to refactor some lines of codes to get better and more robust codes. As you have seen, in qed_read_table_cb() it's nice to use qiov->size because that function doesn't obviously use a single struct iovec. In other two functions, if qiov use more than one struct iovec, the existing way will get wrong nb_sectors. To make the code more robust, it will be nicer to refactor the existing way as below. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Acked-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05qcow2: avoid reentrant bdrv_read() in copy_sectors()Stefan Hajnoczi
A BlockDriverState should not issue requests on itself through the public block layer interface. Nested, or reentrant, requests are problematic because they do I/O throttling and request tracking twice. Features like block layer copy-on-read use request tracking to avoid race conditions between concurrent requests. The reentrant request will have to "wait" for its parent request to complete. But the parent is waiting for the reentrant request to make progress so we have reached deadlock. The solution is for block drivers to avoid the public block layer interfaces for reentrant requests. Instead they should call their own internal functions if they wish to perform reentrant requests. This is also a good opportunity to make copy_sectors() a true coroutine_fn. That means calling bdrv_co_writev() instead of bdrv_write(). Behavior is unchanged but we're being explicit that this executes in coroutine context. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05qcow2: Unlock during COWKevin Wolf
Unlocking during COW allows for more parallelism. One change it requires is that buffers are dynamically allocated instead of just using a per-image buffer. While touching the code, drop the synchronous qcow2_read() function and replace it by a bdrv_read() call. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-01Update version for 1.0 releasev1.0Anthony Liguori
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-30Makefile: use full path for qapi-generated directoryMichael Roth
Generally $(BUILD_DIR) == $(CURDIR), but that isn't necessarilly the case, so use $(BUILD_DIR)/qapi-generated for generated files to avoid potentionally sticking generating files in odd places outside the build's include paths. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-30qapi: fix guardname generationMichael Roth
Fix a bug in handling dotted paths, and exclude directory prefixes from generated guardnames to avoid odd/pseudo-random guardnames in generated headers. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28Update version for 1.0-rc4v1.0-rc4Anthony Liguori
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28ccid: Fix buffer overrun in handling of VSC_ATR messageMarkus Armbruster
ATR size exceeding the limit is diagnosed, but then we merrily use it anyway, overrunning card->atr[]. The message is read from a character device. Obvious security implications unless the other end of the character device is trusted. Spotted by Coverity. CVE-2011-4111. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28Revert "fix out of tree build"Anthony Liguori
This reverts commit be85c90b74f56dca51782fa3080fcdf88593e045. This patch is incorrect and breaks the build with a freshly cloned git tree. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28configure: avoid screening of --{en, dis}able-usb-redir optionsMax Filippov
--*dir) option pattern precede --{en,dis}able-usb-redir) patterns in the option analysis switch, making the latter options have no effect. There were some --*dir that are supported by Autoconf and not by QEMU configure. The aim was to let QEMU packagers use the rpm (or similar) macro that overrides directories for their distribution. Replace --*dir with exact option names. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28cutils: Make strtosz & friends leave follow set to callersMarkus Armbruster
strtosz() & friends require the size to be at the end of the string, or be followed by whitespace or ','. I find this surprising, because the name suggests it works like strtol(). The check simplifies callers that accept exactly that follow set slightly. No such callers exist. The check is redundant for callers that accept a smaller follow set, and thus need to check themselves anyway. Right now, this is the case for all but one caller. All of them neglected to check, or checked incorrectly, but the previous few commits fixed them up. Finally, the check is problematic for callers that accept a larger follow set. This is the case in monitor_parse_command(). Fortunately, the problems there are relatively harmless. monitor_parse_command() uses strtosz() for argument type 'o'. When the last argument is of type 'o', a trailing ',' is diagnosed differently than other trailing junk: (qemu) migrate_set_speed 1x invalid size (qemu) migrate_set_speed 1, migrate_set_speed: extraneous characters at the end of line A related inconsistency exists with non-last arguments. No such command exists, but let's use memsave to explore the inconsistency. The monitor permits, but does not require whitespace between arguments. For instance, "memsave (1-1)1024foo" is parsed as command memsave with three arguments 0, 1024 and "foo". Yes, this is daft, but at least it's consistently daft. If I change memsave's second argument from 'i' to 'o', then "memsave (1-1)1foo" is rejected, because the size is followed by an 'f'. But "memsave (1-1)1," is still accepted, and duly saves to file ",". We don't have any users of strtosz that profit from the check. In the users we have, it appears to encourage sloppy error checking, or gets in the way. Drop the bothersome check. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28qemu-img: Tighten parsing of size argumentsMarkus Armbruster
strtosz_suffix() fails unless the size is followed by 0, whitespace or ','. Useless here, because we need to fail for any junk following the size, even if it starts with whitespace or ','. Check manually. Things like "qemu-img create xxx 1024," and "qemu-img convert -S '1024 junk'" are now caught. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28x86/cpuid: Tighten parsing of tsc_freq=FREQMarkus Armbruster
cpu_x86_find_by_name() uses strtosz_suffix_unit(), but screws up the error checking. It detects some failures, but not all. Undetected failures result in a zero tsc_khz value (error value -1 divided by 1000), which means "no tsc_freq set". To reproduce, try "-cpu qemu64,tsc_freq=9999999T". strtosz_suffix_unit() fails, because the value overflows int64_t, Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28vl: Tighten parsing of -m argumentMarkus Armbruster
strtosz_suffix() fails unless the size is followed by 0, whitespace or ','. Useless here, because we need to fail for any junk following the size, even if it starts with whitespace or ','. Check manually. Things like "-m 1024," are now caught. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28vl: Tighten parsing of -numa's parameter memMarkus Armbruster
strtosz_suffix() fails unless the size is followed by 0, whitespace or ','. Useless here, because we need to fail for any junk following the size, even if it starts with whitespace or ','. Check manually. Things like -smp 4 -numa "node,mem=1024,cpus=0-1" -numa "node,mem=1024 cpus=2-3" are now caught. Before, the second -numa's argument was silently interpreted as just "node,mem=1024". Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28cutils: Drop broken support for zero strtosz default_suffixMarkus Armbruster
Commit 9f9b17a4's strtosz() defaults a missing suffix to 'M', except it rejects fractions then (switch case 0). When commit d8427002 introduced strtosz_suffix(), that changed: fractions are no longer rejected, because we go to switch case 'M' on missing suffix now. Not mentioned in commit message, probably unintentional. Not worth changing back now. Because case 0 is still around, you can get the old behavior by passing a zero default_suffix to strtosz_suffix() or strtosz_suffix_unit(). Undocumented and not used. Drop. Commit d8427002 also neglected to update the function comment. Fix it up. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28configure: tighten pie toolchain support test for tls variablesAvi Kivity
Some toolchains don't support pie properly when tls variables are in use. Disallow pie when such toolchains are detected. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-28usb-redir: Don't try to write to the chardev after a close eventHans de Goede
Since we handle close async in a bh, do_write and thus write can get called after receiving a close event. This patch adds a check to the usb-redir write callback to not call qemu_chr_fe_write on a closed backend. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>