aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2015-03-10block-backend: Add wrappers for blocksizes and geometry probingEkaterina Tumanova
Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1424087278-49393-5-git-send-email-tumanova@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10block: Add driver methods to probe blocksizes and geometryEkaterina Tumanova
Introduce driver methods of defining disk blocksizes (physical and logical) and hard drive geometry. Methods are only implemented for "host_device". For "raw" devices driver calls child's method. For now geometry detection will only work for DASD devices. To check that a local check_for_dasd function was introduced. It calls BIODASDINFO2 ioctl and returns its rc. Blocksizes detection function will probe sizes for DASD devices. Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1424087278-49393-4-git-send-email-tumanova@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10raw-posix: Factor block size detection out of raw_probe_alignment()Ekaterina Tumanova
Put it in new probe_logical_blocksize(). Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1424087278-49393-3-git-send-email-tumanova@linux.vnet.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10blkdebug: fix "once" ruleJohn Snow
Background: The blkdebug scripts are currently engineered so that when a debug event occurs, a prefilter browses a master list of parsed rules for a certain event and adds them to an "active list" of rules to be used for the forthcoming action, provided the events and state numbers match. Then, once the request is received, the last active rule is used to inject an error if certain parameters match. This active list is cleared every time the prefilter injects a new rule for the first time during a debug event. The "once" rule currently causes the error injection, if it is triggered, to only clear the active list. This is insufficient for preventing future injections of the same rule. Remedy: This patch /deletes/ the rule from the list that the prefilter browses, so it is gone for good. In V2, we remove only the rule of interest from the active list instead of allowing the "once" rule to clear the entire list of active rules. Impact: This affects iotests 026. Several ENOSPC tests that used "once" can be seen to have output that shows multiple failure messages. After this patch, the error messages tend to be smaller and less severe, but the injection can still be seen to be working. I have patched the expected output to expect the smaller error messages. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1423257977-25630-1-git-send-email-jsnow@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10qcow2: Allow creation with refcount order != 4Max Reitz
Add a creation option to qcow2 for setting the refcount order of images to be created, and respect that option's value. This breaks some test outputs, fix them. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10qcow2: Use symbolic macros in qcow2_amend_optionsMax Reitz
qcow2_amend_options() should not compare options against some inline strings but rather use the symbolic macros available for each of the creation options. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10qcow2: refcount_order parameter for qcow2_create2Max Reitz
Add a refcount_order parameter to qcow2_create2(), use that value for the image header and for calculating the size required for preallocation. For now, always pass 4. This addition requires changes to the calculation of the file size for the "full" and "falloc" preallocation modes. That in turn is a nice opportunity to add a comment about that calculation not necessarily being exact (and that being intentional). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10qcow2: Open images with refcount order != 4Max Reitz
No longer refuse to open images with a different refcount entry width than 16 bits; only reject images with a refcount width larger than 64 bits (which is prohibited by the specification). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10qcow2: More helpers for refcount modificationMax Reitz
Add helper functions for getting and setting refcounts in a refcount array for any possible refcount order, and choose the correct one during refcount initialization. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10qcow2: Helper function for refcount modificationMax Reitz
Since refcounts do not always have to be a uint16_t, all refcount blocks and arrays in memory should not have a specific type (thus they become pointers to void) and for accessing them, two helper functions are used (a getter and a setter). Those functions are called indirectly through function pointers in the BDRVQcowState so they may later be exchanged for different refcount orders. With the check and repair functions using this function, the refcount array they are creating will be in big endian byte order; additionally, using realloc_refcount_array() makes the size of this refcount array always cluster-aligned. Both combined allow rebuild_refcount_structure() to drop the bounce buffer which was used to convert parts of the refcount array to big endian byte order and store them on disk. Instead, those parts can now be written directly. [ kwolf: Fixed a build failure on 32 bit and another with old glib ] Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10qcow2: Helper for refcount array reallocationMax Reitz
Add a helper function for reallocating a refcount array, independent of the refcount order. The newly allocated space is zeroed and the function handles failed reallocations gracefully. The helper function will always align the buffer size to a cluster boundary; if storing the refcounts in such an array in big endian byte order, this makes it possible to write parts of the array directly as refcount blocks into the image file. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10qcow2: Use 64 bits for refcount valuesMax Reitz
Refcounts may have a width of up to 64 bits, so qemu should use the same width to represent refcount values internally. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10qcow2: Use unsigned addend for update_refcount()Max Reitz
update_refcount() and qcow2_update_cluster_refcount() currently take a signed addend. At least one caller passes a value directly derived from an absolute refcount that should be reached ("l2_refcount - 1" in expand_zero_clusters_in_l1()). Therefore, the addend should be unsigned as well; this will be especially important for 64 bit refcounts. Because update_refcount() then no longer knows whether the refcount should be increased or decreased, it now requires an additional flag which specified exactly that. The same applies to qcow2_update_cluster_refcount(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10qcow2: Only return status from qcow2_get_refcountMax Reitz
Refcounts can theoretically be of type uint64_t; in order to be able to represent the full range, qcow2_get_refcount() cannot use a single variable to represent both all refcount values and also keep some values reserved for errors. One solution would be to add an Error pointer parameter to qcow2_get_refcount(); however, no caller could (currently) pass that error message, so it would have to be emitted immediately and be passed to the next caller by returning -EIO or something similar. Therefore, an Error parameter does not offer any advantages here. The solution applied by this patch is simpler to use. Because no caller would be able to pass the error message, they would have to print it and free it, whereas with this patch the caller only needs to pass the returned integer (which is often a no-op from the code perspective, because that integer will be stored in a variable "ret" which will be returned by the fail path of many callers). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10qcow2: Do not return new value after refcount updateMax Reitz
qcow2_update_cluster_refcount() does not have any quick access to the new refcount value, it has to call qcow2_get_refcount(). Some callers do not need that new value at all, others call qcow2_get_refcount() themselves anyway (albeit in a different code path, which can however be easily changed), therefore there is no advantage in making qcow2_update_cluster_refcount() return the new value. Drop it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10qcow2: Add refcount_bits to format-specific infoMax Reitz
Add the bit width of every refcount entry to the format-specific information. In contrast to lazy_refcounts and the corrupt flag, this should be always emitted, even for compat=0.10 although it does not support any refcount width other than 16 bits. This is because if a boolean is optional, one normally assumes it to be false when omitted; but if an integer is not specified, it is rather difficult to guess its value. This new field breaks some test outputs, fix them. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10qcow2: Add two new fields to BDRVQcowStateMax Reitz
Add two new fields regarding refcount information (the bit width of every entry and the maximum refcount value) to the BDRVQcowState. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-09qcow2: Remove unused struct QCowCreateStateKevin Wolf
The only user went away five years ago with commit a9420734 ('qcow2: Simplify image creation'). It's about time to remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-03-09block/raw-posix: fix compilation warning on OSXDenis V. Lunev
block/raw-posix.c:947:19: warning: unused variable 's' [-Wunused-variable] BDRVRawState *s = aiocb->bs->opaque; This variable is used only when on of the following macros are defined CONFIG_XFS, CONFIG_FALLOCATE, CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE. Fortunately, CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE could be defined only along with CONFIG_FALLOCATE. Therefore checking for CONFIG_XFS or CONFIG_FALLOCATE would be enough. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Peter Maydell <peter.maydell@linaro.org> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-09sheepdog: selectable object size supportTeruaki Ishizaki
Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify object_size option. If you want to create a VDI of 8MB object size, you need to specify following command option. # qemu-img create -o object_size=8M sheepdog:test1 100M In addition, when you don't specify qemu-img command option, a default value of sheepdog cluster is used for creating VDI. # qemu-img create sheepdog:test2 100M Signed-off-by: Teruaki Ishizaki <ishizaki.teruaki@lab.ntt.co.jp> Acked-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-09vpc: Implement bdrv_co_get_block_status()Kevin Wolf
This implements bdrv_co_get_block_status() for VHD images. This can significantly speed up qemu-img convert operation because only with this function implemented sparseness can be considered. (Before, converting a 1 TB empty image took several minutes for me, now it's instantaneous.) Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-03-09vpc: Fix size in fixed image creationKevin Wolf
If total_sectors is rounded to match the geometry, total_size needs to be changed as well. Otherwise we end up with an image whose geometry describes a disk larger than the image file, which doesn't end well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-03-03Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell
- more config options - bootdevice, iscsi, virtio-scsi fixes - build system patches for MinGW and config-devices.mak - qemu_mutex_lock_iothread deadlock fixes - another tiny patch from the record/replay series # gpg: Signature made Mon Mar 2 09:59:14 2015 GMT using RSA key ID 78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: cpus: be more paranoid in avoiding deadlocks cpus: fix deadlock and segfault in qemu_mutex_lock_iothread virtio-scsi: Allocate op blocker reason before blocking Makefile.target: binary depends on config-devices Makefile: don't silence mak file test with V=1 Makefile: fix up parallel building under MSYS+MinGW iscsi: Handle write protected case in reopen Give ivshmem its own config option Create specific config option for "platform-bus" Add specific config options for PCI-E bridges bootdevice: fix segment fault when booting guest with '-kernel' and '-initrd' timer: replace time() with QEMU_CLOCK_HOST virtio-scsi-dataplane: Call blk_set_aio_context within BQL block: Forbid bdrv_set_aio_context outside BQL scsi: give device a parent before setting properties Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-02-27iscsi: Handle write protected case in reopenFam Zheng
Save the write protected flag and check before reopen. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <1424839208-5195-1-git-send-email-famz@redhat.com> [Fixed typo in the name of the new field. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-26QemuOpts: Drop qemu_opt_set(), rename qemu_opt_set_err(), fix useMarkus Armbruster
qemu_opt_set() is a wrapper around qemu_opt_set() that reports the error with qerror_report_err(). Most of its users assume the function can't fail. Make them use qemu_opt_set_err() with &error_abort, so that should the assumption ever break, it'll break noisily. Just two users remain, in util/qemu-config.c. Switch them to qemu_opt_set_err() as well, then rename qemu_opt_set_err() to qemu_opt_set(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-02-26QemuOpts: Convert qemu_opt_set_number() to Error, fix its useMarkus Armbruster
Return the Error object instead of reporting it with qerror_report_err(). Change callers that assume the function can't fail to pass &error_abort, so that should the assumption ever break, it'll break noisily. Turns out all callers outside its unit test assume that. We could drop the Error ** argument, but that would make the interface less regular, so don't. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-02-26Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2015-02-18' ↵Peter Maydell
into staging Clean up around error_get_pretty(), qerror_report_err() # gpg: Signature made Wed Feb 18 10:10:07 2015 GMT using RSA key ID EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" * remotes/armbru/tags/pull-error-2015-02-18: qemu-char: Avoid qerror_report_err() outside QMP command handlers qemu-img: Avoid qerror_report_err() outside QMP command handlers vl: Avoid qerror_report_err() outside QMP command handlers tpm: Avoid qerror_report_err() outside QMP command handlers numa: Avoid qerror_report_err() outside QMP command handlers net: Avoid qerror_report_err() outside QMP command handlers monitor: Avoid qerror_report_err() outside QMP command handlers monitor: Clean up around monitor_handle_fd_param() error: Use error_report_err() where appropriate error: New convenience function error_report_err() vhost-scsi: Improve error reporting for invalid vhostfd Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-02-25Merge remote-tracking branch 'remotes/armbru/tags/pull-monitor-2015-02-18' ↵Peter Maydell
into staging hmp: Normalize HMP command handler names # gpg: Signature made Wed Feb 18 10:59:44 2015 GMT using RSA key ID EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" * remotes/armbru/tags/pull-monitor-2015-02-18: hmp: Name HMP info handler functions hmp_info_SUBCOMMAND() hmp: Name HMP command handler functions hmp_COMMAND() hmp: Clean up declarations for long-gone info handlers Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-02-18hmp: Name HMP command handler functions hmp_COMMAND()Markus Armbruster
Some are called do_COMMAND() (old ones, usually), some hmp_COMMAND(), and sometimes COMMAND pointlessly differs in spelling. Normalize to hmp_COMMAND(), where COMMAND is exactly the command name with '-' replaced by '_'. Exceptions: * do_device_add() and client_migrate_info() *not* renamed to hmp_device_add(), hmp_client_migrate_info(), because they're also QMP handlers. They still need to be converted to QAPI. * do_memory_dump(), do_physical_memory_dump(), do_ioport_read(), do_ioport_write() renamed do hmp_* instead of hmp_x(), hmp_xp(), hmp_i(), hmp_o(), because those names are too cryptic for my taste. * do_info_help() renamed to hmp_info_help() instead of hmp_info(), because it only covers help. Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-02-18error: Use error_report_err() where appropriateMarkus Armbruster
Coccinelle semantic patch: @@ expression E; @@ - error_report("%s", error_get_pretty(E)); - error_free(E); + error_report_err(E); @@ expression E, S; @@ - error_report("%s", error_get_pretty(E)); + error_report_err(E); ( exit(S); | abort(); ) Trivial manual touch-ups in block/sheepdog.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-02-16block: Remove "growable" from BDSMax Reitz
Now that request clamping is done in the BlockBackend, the "growable" field can be removed from the BlockDriverState. All BDSs are now treated as being "growable" (that is, they are allowed to grow; they are not necessarily actually able to). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-16-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16block: Clamp BlockBackend requestsMax Reitz
BlockBackend is used as the interface between the block layer and guest devices. It should therefore assure that all requests are clamped to the image size. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1423162705-32065-15-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16block: Add Error parameter to bdrv_find_protocol()Max Reitz
The argument given to bdrv_find_protocol() is just a file name, which makes it difficult for the caller to reconstruct what protocol bdrv_find_protocol() was hoping to find. This patch adds an Error parameter to that function to solve this issue. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-4-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16block: Add blk_new_open()Max Reitz
blk_new_with_bs() creates a BlockBackend with an empty BlockDriverState attached to it. Empty BDSs are not nice, therefore add an alternative function which combines blk_new_with_bs() with bdrv_open(). Note: In contrast to bdrv_open() which takes a BlockDriver parameter, blk_new_open() does not take such a parameter. This is because bdrv_open() opens a BlockDriverState, therefore it is natural to be able to set the BlockDriver for that BDS. The fact that bdrv_open() can open more than a single BDS is merely some form of a byproduct. blk_new_open() on the other hand is intended to be used to create a whole tree of BlockDriverStates. Therefore, setting a single BlockDriver does not make much sense. Instead, the drivers to be used for each of the nodes must be configured through the "options" QDict; including the driver of the root BDS. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1423162705-32065-3-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16block: Lift some BDS functions to the BlockBackendMax Reitz
Create the blk_* counterparts for the following bdrv_* functions (which make sense to call on the BlockBackend level): - bdrv_co_write_zeroes() - bdrv_write_compressed() - bdrv_truncate() - bdrv_nb_sectors() - bdrv_discard() - bdrv_load_vmstate() - bdrv_save_vmstate() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1423162705-32065-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16block: vmdk - fixed sizeof() errorJeff Cody
The size compared should be PATH_MAX, rather than sizeof(char *). Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 46d873261433f4527e88885582f96942d61758d6.1423592487.git.jcody@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16nbd: fix the co_queue multi-adding bugBin Wu
When we tested the VM migartion between different hosts with NBD devices, we found if we sent a cancel command after the drive_mirror was just started, a coroutine re-enter error would occur. The stack was as follow: (gdb) bt 00) 0x00007fdfc744d885 in raise () from /lib64/libc.so.6 01) 0x00007fdfc744ee61 in abort () from /lib64/libc.so.6 02) 0x00007fdfca467cc5 in qemu_coroutine_enter (co=0x7fdfcaedb400, opaque=0x0) at qemu-coroutine.c:118 03) 0x00007fdfca467f6c in qemu_co_queue_run_restart (co=0x7fdfcaedb400) at qemu-coroutine-lock.c:59 04) 0x00007fdfca467be5 in coroutine_swap (from=0x7fdfcaf3c4e8, to=0x7fdfcaedb400) at qemu-coroutine.c:96 05) 0x00007fdfca467cea in qemu_coroutine_enter (co=0x7fdfcaedb400, opaque=0x0) at qemu-coroutine.c:123 06) 0x00007fdfca467f6c in qemu_co_queue_run_restart (co=0x7fdfcaedbdc0) at qemu-coroutine-lock.c:59 07) 0x00007fdfca467be5 in coroutine_swap (from=0x7fdfcaf3c4e8, to=0x7fdfcaedbdc0) at qemu-coroutine.c:96 08) 0x00007fdfca467cea in qemu_coroutine_enter (co=0x7fdfcaedbdc0, opaque=0x0) at qemu-coroutine.c:123 09) 0x00007fdfca4a1fa4 in nbd_recv_coroutines_enter_all (s=0x7fdfcaef7dd0) at block/nbd-client.c:41 10) 0x00007fdfca4a1ff9 in nbd_teardown_connection (client=0x7fdfcaef7dd0) at block/nbd-client.c:50 11) 0x00007fdfca4a20f0 in nbd_reply_ready (opaque=0x7fdfcaef7dd0) at block/nbd-client.c:92 12) 0x00007fdfca45ed80 in aio_dispatch (ctx=0x7fdfcae15e90) at aio-posix.c:144 13) 0x00007fdfca45ef1b in aio_poll (ctx=0x7fdfcae15e90, blocking=false) at aio-posix.c:222 14) 0x00007fdfca448c34 in aio_ctx_dispatch (source=0x7fdfcae15e90, callback=0x0, user_data=0x0) at async.c:212 15) 0x00007fdfc8f2f69a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 16) 0x00007fdfca45c391 in glib_pollfds_poll () at main-loop.c:190 17) 0x00007fdfca45c489 in os_host_main_loop_wait (timeout=1483677098) at main-loop.c:235 18) 0x00007fdfca45c57b in main_loop_wait (nonblocking=0) at main-loop.c:484 19) 0x00007fdfca25f403 in main_loop () at vl.c:2249 20) 0x00007fdfca266fc2 in main (argc=42, argv=0x7ffff517d638, envp=0x7ffff517d790) at vl.c:4814 We find the nbd_recv_coroutines_enter_all function (triggered by a cancel command or a network connection breaking down) will enter a coroutine which is waiting for the sending lock. If the lock is still held by another coroutine, the entering coroutine will be added into the co_queue again. Latter, when the lock is released, a coroutine re-enter error will occur. This bug can be fixed simply by delaying the setting of recv_coroutine as suggested by paolo. After applying this patch, we have tested the cancel operation in mirror phase looply for more than 5 hous and everything is fine. Without this patch, a coroutine re-enter error will occur in 5 minutes. Signed-off-by: Bn Wu <wu.wubin@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1423552846-3896-1-git-send-email-wu.wubin@huawei.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16nbd: Drop BDS backpointerMax Reitz
Before this patch, the "opaque" pointer in an NBD BDS points to a BDRVNBDState, which contains an NbdClientSession object, which in turn contains a pointer to the BDS. This pointer may become invalid due to bdrv_swap(), so drop it, and instead pass the BDS directly to the nbd-client.c functions which then retrieve the NbdClientSession object from there. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1423256778-3340-2-git-send-email-mreitz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-06block/raw-posix.c: Fix raw_getlength() on Mac OS X block devicesProgrammingkid
This patch replaces the dummy code in raw_getlength() for block devices on OS X, which always returned LLONG_MAX, with a real implementation that returns the actual block device size. Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06qcow2: Rewrite qcow2_alloc_bytes()Max Reitz
qcow2_alloc_bytes() is a function with insufficient error handling and an unnecessary goto. This patch rewrites it. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block: Give always priority to unused entries in the qcow2 L2 cacheAlberto Garcia
The current algorithm to replace entries from the L2 cache gives priority to newer hits by dividing the hit count of all existing entries by two everytime there is a cache miss. However, if there are several cache misses the hit count of the existing entries can easily go down to 0. This will result in those entries being replaced even when there are others that have never been used. This problem is more noticeable with larger disk images and cache sizes, since the chances of having several misses before the cache is full are higher. If we make sure that the hit count can never go down to 0 again, unused entries will always have priority. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06nbd: fix max_discard/max_transfer_lengthDenis V. Lunev
nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t as the length in bytes of the data to discard due to the following definition: struct nbd_request { uint32_t magic; uint32_t type; uint64_t handle; uint64_t from; uint32_t len; <-- the length of data to be discarded, in bytes } QEMU_PACKED; Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to avoid overflow. NBD read/write code uses the same structure for transfers. Fix max_transfer_length accordingly. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Peter Lieven <pl@kamp.de> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06nbd: Improve error messagesMax Reitz
This patch makes use of the Error object for nbd_receive_negotiate() so that errors during negotiation look nicer. Furthermore, this patch adds an additional error message if the received magic was wrong, but would be correct for the other protocol version, respectively: So if an export name was specified, but the NBD server magic corresponds to an old handshake, this condition is explicitly signaled to the user, and vice versa. As these messages are now part of the "Could not open image" error message, additional filtering has to be employed in iotest 083, which this patch does as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block: fix off-by-one error in qcow and qcow2Jeff Cody
This fixes an off-by-one error introduced in 9a29e18. Both qcow and qcow2 need to make sure to leave room for string terminator '\0' for the backing file, so the max length of the non-terminated string is either 1023 or PATH_MAX - 1. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06qed: check for header size overflowStefan Hajnoczi
Header size is denoted in clusters. The maximum cluster size is 64 MB but there is no limit on header size. Check for uint32_t overflow in case the header size field has a whacky value. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1421065893-18875-2-git-send-email-stefanha@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: improve zeroes handlingPeter Wu
Disk images may contain large all-zeroes gaps (1.66k sectors or 812 MiB is seen in the real world). These blocks (type 2) do not need to be extracted into a temporary buffer, there is no need to allocate memory for these blocks nor to check its length. (For the test image, the maximum uncompressed size is 1054371 bytes, probably for a bzip2-compressed block.) Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-13-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: support bzip2 block entry typesPeter Wu
This patch adds support for bzip2-compressed block entries as introduced with OS X 10.4 (source: https://en.wikipedia.org/wiki/Apple_Disk_Image). It was tested against a 5.2G "OS X Yosemite" installation image which stores the BLXX block in the XML property list (instead of resource forks) and has over 5k chunks. New configure entries are added (--enable-bzip2 / --disable-bzip2) to control inclusion of bzip2 functionality (which requires linking against libbz2). The help message suggests that this option is needed for DMG files, but the tests are generic enough that other parts of QEMU can use bzip2 if needed. The identifiers are based on http://newosxbook.com/DMG.html. The decompression routines are based on the zlib case, but as there is no way to reset the decompression state (unlike zlib), memory is allocated and deallocated for every decompression. This should not be problematic as the decompression takes most of the time and as blocks are typically about/over 1 MiB in size, only one allocation is done every 2000 sectors. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1420566495-13284-12-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: factor out block type checkPeter Wu
In preparation for adding bzip2 support, split the type check into a separate function. Make all offsets relative to the begin of a chunk such that it is easier to recognize the position without having to add up all offsets. Some comments are added to describe the fields. There is no functional change. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-11-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: use SectorNumber from BLKX headerPeter Wu
Previously the sector table parsing relied on the previous offset of the DMG file. Now it uses the sector number from the BLKX header (see http://newosxbook.com/DMG.html). The implementation of dmg2img (from vu1tur) does not base the output sector on the location of the terminator (0xffffffff) either so it should be safe to drop this dependency on the previous state. (It makes somehow makes sense, a terminator should halt further processing of a block and is perhaps used to preallocate some space.) Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-10-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block/dmg: fix sector data offset calculationPeter Wu
This patch addresses two issues: - The data fork offset was not taken into account, resulting in failure to read an InstallESD.dmg file (5164763151 bytes) which had a non-zero DataForkOffset field. - The offset of the previous block ("partition") was unconditionally added to the current block because older files would start the input offset of a new block at zero. Newer files (including vlc-2.1.5.dmg, tuxpaint-0.9.15-macosx.dmg and OS X Yosemite [MAS].dmg) failed in reads because these files have chunk offsets, relative to the begin of a data fork. Now the data offset of the mish is taken into account. While we could check that the data_offset is within the data fork, let's not do that here as it would only result in parse failures on invalid files (rather than gracefully handling such bad files). dmg_read will error out if the offset is incorrect. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 1420566495-13284-9-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>