aboutsummaryrefslogtreecommitdiff
path: root/block/file-posix.c
AgeCommit message (Collapse)Author
2019-09-05block: workaround for unaligned byte range in fallocate()Andrey Shinkevich
Revert the commit 118f99442d 'block/io.c: fix for the allocation failure' and use better error handling for file systems that do not support fallocate() for an unaligned byte range. Allow falling back to pwrite in case fallocate() returns EINVAL. Suggested-by: Kevin Wolf <kwolf@redhat.com> Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <1566913973-15490-1-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2019-09-03file-posix: fix request_alignment typoStefan Hajnoczi
Fixes: a6b257a08e3d72219f03e461a52152672fec0612 ("file-posix: Handle undetectable alignment") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190827101328.4062-1-stefanha@redhat.com Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-09-03block: posix: Always allocate the first blockNir Soffer
When creating an image with preallocation "off" or "falloc", the first block of the image is typically not allocated. When using Gluster storage backed by XFS filesystem, reading this block using direct I/O succeeds regardless of request length, fooling alignment detection. In this case we fallback to a safe value (4096) instead of the optimal value (512), which may lead to unneeded data copying when aligning requests. Allocating the first block avoids the fallback. Since we allocate the first block even with preallocation=off, we no longer create images with zero disk size: $ ./qemu-img create -f raw test.raw 1g Formatting 'test.raw', fmt=raw size=1073741824 $ ls -lhs test.raw 4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw And converting the image requires additional cluster: $ ./qemu-img measure -f raw -O qcow2 test.raw required size: 458752 fully allocated size: 1074135040 When using format like vmdk with multiple files per image, we allocate one block per file: $ ./qemu-img create -f vmdk -o subformat=twoGbMaxExtentFlat test.vmdk 4g Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined subformat=twoGbMaxExtentFlat $ ls -lhs test*.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f001.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f002.vmdk 4.0K -rw-r--r--. 1 nsoffer nsoffer 353 Aug 27 03:23 test.vmdk I did quick performance test for copying disks with qemu-img convert to new raw target image to Gluster storage with sector size of 512 bytes: for i in $(seq 10); do rm -f dst.raw sleep 10 time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw done Here is a table comparing the total time spent: Type Before(s) After(s) Diff(%) --------------------------------------- real 530.028 469.123 -11.4 user 17.204 10.768 -37.4 sys 17.881 7.011 -60.7 We can see very clear improvement in CPU usage. Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-id: 20190827010528.8818-2-nsoffer@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-19block: Implement .bdrv_has_zero_init_truncate()Max Reitz
We need to implement .bdrv_has_zero_init_truncate() for every block driver that supports truncation and has a .bdrv_has_zero_init() implementation. Implement it the same way each driver implements .bdrv_has_zero_init(). This is at least not any more unsafe than what we had before. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-08-17block: fix NetBSD qemu-iotests failurePaolo Bonzini
Opening a block device on NetBSD has an additional step compared to other OSes, corresponding to raw_normalize_devicepath. The error message in that function is slightly different from that in raw_open_common and this was causing spurious failures in qemu-iotests. However, in general it is not important to know what exact step was failing, for example in the qemu-iotests case the error message contains the fairly unequivocal "No such file or directory" text from strerror. We can thus fix the failures by standardizing on a single error message for both raw_open_common and raw_normalize_devicepath; in fact, we can even use error_setg_file_open to make sure the error message is the same as in the rest of QEMU. Message-Id: <20190725095920.28419-1-pbonzini@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2019-08-16file-posix: Handle undetectable alignmentNir Soffer
In some cases buf_align or request_alignment cannot be detected: 1. With Gluster, buf_align cannot be detected since the actual I/O is done on Gluster server, and qemu buffer alignment does not matter. Since we don't have alignment requirement, buf_align=1 is the best value. 2. With local XFS filesystem, buf_align cannot be detected if reading from unallocated area. In this we must align the buffer, but we don't know what is the correct size. Using the wrong alignment results in I/O error. 3. With Gluster backed by XFS, request_alignment cannot be detected if reading from unallocated area. In this case we need to use the correct alignment, and failing to do so results in I/O errors. 4. With NFS, the server does not use direct I/O, so both buf_align cannot be detected. In this case we don't need any alignment so we can use buf_align=1 and request_alignment=1. These cases seems to work when storage sector size is 512 bytes, because the current code starts checking align=512. If the check succeeds because alignment cannot be detected we use 512. But this does not work for storage with 4k sector size. To determine if we can detect the alignment, we probe first with align=1. If probing succeeds, maybe there are no alignment requirement (cases 1, 4) or we are probing unallocated area (cases 2, 3). Since we don't have any way to tell, we treat this as undetectable alignment. If probing with align=1 fails with EINVAL, but probing with one of the expected alignments succeeds, we know that we found a working alignment. Practically the alignment requirements are the same for buffer alignment, buffer length, and offset in file. So in case we cannot detect buf_align, we can use request alignment. If we cannot detect request alignment, we can fallback to a safe value. To use this logic, we probe first request alignment instead of buf_align. Here is a table showing the behaviour with current code (the value in parenthesis is the optimal value). Case Sector buf_align (opt) request_alignment (opt) result ====================================================================== 1 512 512 (1) 512 (512) OK 1 4096 512 (1) 4096 (4096) FAIL ---------------------------------------------------------------------- 2 512 512 (512) 512 (512) OK 2 4096 512 (4096) 4096 (4096) FAIL ---------------------------------------------------------------------- 3 512 512 (1) 512 (512) OK 3 4096 512 (1) 512 (4096) FAIL ---------------------------------------------------------------------- 4 512 512 (1) 512 (1) OK 4 4096 512 (1) 512 (1) OK Same cases with this change: Case Sector buf_align (opt) request_alignment (opt) result ====================================================================== 1 512 512 (1) 512 (512) OK 1 4096 4096 (1) 4096 (4096) OK ---------------------------------------------------------------------- 2 512 512 (512) 512 (512) OK 2 4096 4096 (4096) 4096 (4096) OK ---------------------------------------------------------------------- 3 512 4096 (1) 4096 (512) OK 3 4096 4096 (1) 4096 (4096) OK ---------------------------------------------------------------------- 4 512 4096 (1) 4096 (1) OK 4 4096 4096 (1) 4096 (1) OK I tested that provisioning VMs and copying disks on local XFS and Gluster with 4k bytes sector size work now, resolving bugs [1],[2]. I tested also on XFS, NFS, Gluster with 512 bytes sector size. [1] https://bugzilla.redhat.com/1737256 [2] https://bugzilla.redhat.com/1738657 Signed-off-by: Nir Soffer <nsoffer@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-12file-posix: Use max transfer length/segment count only for SCSI passthroughMaxim Levitsky
Regular kernel block devices (/dev/sda*, /dev/nvme*, etc) don't have max segment size/max segment count hardware requirements exposed to the userspace, but rather the kernel block layer takes care to split the incoming requests that violate these requirements. Allowing the kernel to do the splitting allows qemu to avoid various overheads that arise otherwise from this. This is especially visible in nbd server, exposing as a raw file, a mostly empty qcow2 image over the net. In this case most of the reads by the remote user won't even hit the underlying kernel block device, and therefore most of the overhead will be in the nbd traffic which increases significantly with lower max transfer size. In addition to that even for local block device access the peformance improves a bit due to less traffic between qemu and the kernel when large transfer sizes are used (e.g for image conversion) More info can be found at: https://bugzilla.redhat.com/show_bug.cgi?id=1647104 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-18file-posix: Update open_flags in raw_set_perm()Max Reitz
raw_check_perm() + raw_set_perm() can change the flags associated with the current FD. If so, we have to update BDRVRawState.open_flags accordingly. Otherwise, we may keep reopening the FD even though the current one already has the correct flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-12block/file-posix: update .help of BLOCK_OPT_PREALLOC optionStefano Garzarella
Show 'falloc' among the allowed values of 'preallocation' option, only when it is supported (if defined CONFIG_POSIX_FALLOCATE) Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190524075848.23781-3-sgarzare@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2019-06-12Include qemu-common.h exactly where neededMarkus Armbruster
No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]
2019-05-20block/file-posix: Unaligned O_DIRECT block-statusMax Reitz
Currently, qemu crashes whenever someone queries the block status of an unaligned image tail of an O_DIRECT image: $ echo > foo $ qemu-img map --image-opts driver=file,filename=foo,cache.direct=on Offset Length Mapped to File qemu-img: block/io.c:2093: bdrv_co_block_status: Assertion `*pnum && QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset' failed. This is because bdrv_co_block_status() checks that the result returned by the driver's implementation is aligned to the request_alignment, but file-posix can fail to do so, which is actually mentioned in a comment there: "[...] possibly including a partial sector at EOF". Fix this by rounding up those partial sectors. There are two possible alternative fixes: (1) We could refuse to open unaligned image files with O_DIRECT altogether. That sounds reasonable until you realize that qcow2 does necessarily not fill up its metadata clusters, and that nobody runs qemu-img create with O_DIRECT. Therefore, unpreallocated qcow2 files usually have an unaligned image tail. (2) bdrv_co_block_status() could ignore unaligned tails. It actually throws away everything past the EOF already, so that sounds reasonable. Unfortunately, the block layer knows file lengths only with a granularity of BDRV_SECTOR_SIZE, so bdrv_co_block_status() usually would have to guess whether its file length information is inexact or whether the driver is broken. Fixing what raw_co_block_status() returns is the safest thing to do. There seems to be no other block driver that sets request_alignment and does not make sure that it always returns aligned values. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-05-20block/file-posix: Truncate in xfs_write_zeroes()Max Reitz
XFS_IOC_ZERO_RANGE does not increase the file length: $ touch foo $ xfs_io -c 'zero 0 65536' foo $ stat -c "size=%s, blocks=%b" foo size=0, blocks=128 We do want writes beyond the EOF to automatically increase the file length, however. This is evidenced by the fact that iotest 061 is broken on XFS since qcow2's check implementation checks for blocks beyond the EOF. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-04-02block/file-posix: do not fail on unlock bytesVladimir Sementsov-Ogievskiy
bdrv_replace_child() calls bdrv_check_perm() with error_abort on loosening permissions. However file-locking operations may fail even in this case, for example on NFS. And this leads to Qemu crash. Let's avoid such errors. Note, that we ignore such things anyway on permission update commit and abort. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-26file-posix: Support BDRV_REQ_NO_FALLBACK for zero writesKevin Wolf
We know that the kernel implements a slow fallback code path for BLKZEROOUT, so if BDRV_REQ_NO_FALLBACK is given, we shouldn't call it. The other operations we call in the context of .bdrv_co_pwrite_zeroes should usually be quick, so no modification should be needed for them. If we ever notice that there are additional problematic cases, we can still make these conditional as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>
2019-03-14Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into ↵Peter Maydell
staging Pull request * Add 'drop-cache=on|off' option to file-posix.c. The default is on. Disabling the option fixes a QEMU 3.0.0 performance regression when live migrating on the same host with cache.direct=off. # gpg: Signature made Wed 13 Mar 2019 11:07:48 GMT # gpg: using RSA key 9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: file-posix: add drop-cache=on|off option Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-13file-posix: add drop-cache=on|off optionStefan Hajnoczi
Commit dd577a26ff03b6829721b1ffbbf9e7c411b72378 ("block/file-posix: implement bdrv_co_invalidate_cache() on Linux") introduced page cache invalidation so that cache.direct=off live migration is safe on Linux. The invalidation takes a significant amount of time when the file is large and present in the page cache. Normally this is not the case for cross-host live migration but it can happen when migrating between QEMU processes on the same host. On same-host migration we don't need to invalidate pages for correctness anyway, so an option to skip page cache invalidation is useful. I investigated optimizing invalidation and detecting same-host migration, but both are hard to achieve so a user-visible option will suffice. As a bonus this option means that the cache invalidation feature will now be detectable by libvirt via QMP schema introspection. Suggested-by: Neil Skrypuch <neil@tembosocial.com> Tested-by: Neil Skrypuch <neil@tembosocial.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190307164941.3322-1-stefanha@redhat.com Message-Id: <20190307164941.3322-1-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-12block: Add a 'mutable_opts' field to BlockDriverAlberto Garcia
If we reopen a BlockDriverState and there is an option that is present in bs->options but missing from the new set of options then we have to return an error unless the driver is able to reset it to its default value. This patch adds a new 'mutable_opts' field to BlockDriver. This is a list of runtime options that can be modified during reopen. If an option in this list is unspecified on reopen then it must be reset (or return an error). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12file-posix: Make auto-read-only dynamicKevin Wolf
Until now, with auto-read-only=on we tried to open the file read-write first and if that failed, read-only was tried. This is actually not good enough for libvirt, which gives QEMU SELinux permissions for read-write only as soon as it actually intends to write to the image. So we need to be able to switch between read-only and read-write at runtime. This patch makes auto-read-only dynamic, i.e. the file is opened read-only as long as no user of the node has requested write permissions, but it is automatically reopened read-write as soon as the first writer is attached. Conversely, if the last writer goes away, the file is reopened read-only again. bs->read_only is no longer set for auto-read-only=on files even if the file descriptor is opened read-only because it will be transparently upgraded as soon as a writer is attached. This changes the output of qemu-iotests 232. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12file-posix: Prepare permission code for fd switchingKevin Wolf
In order to be able to dynamically reopen the file read-only or read-write, depending on the users that are attached, we need to be able to switch to a different file descriptor during the permission change. This interacts with reopen, which also creates a new file descriptor and performs permission changes internally. In this case, the permission change code must reuse the reopen file descriptor instead of creating a third one. In turn, reopen can drop its code to copy file locks to the new file descriptor because that is now done when applying the new permissions. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12file-posix: Lock new fd in raw_reopen_prepare()Kevin Wolf
There is no reason why we can take locks on the new file descriptor only in raw_reopen_commit() where error handling isn't possible any more. Instead, we can already do this in raw_reopen_prepare(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12file-posix: Store BDRVRawState.reopen_state during reopenKevin Wolf
We'll want to access the file descriptor in the reopen_state while processing permission changes in the context of the repoen. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-03-12file-posix: Factor out raw_reconfigure_getfd()Kevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-01-31block/file-posix: Convert from DPRINTF() macro to trace eventsLaurent Vivier
Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20181213162727.17438-4-lvivier@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-01-11avoid TABs in files that only contain a fewPaolo Bonzini
Most files that have TABs only contain a handful of them. Change them to spaces so that we don't confuse people. disas, standard-headers, linux-headers and libdecnumber are imported from other projects and probably should be exempted from the check. Outside those, after this patch the following files still contain both 8-space and TAB sequences at the beginning of the line. Many of them have a majority of TABs, or were initially committed with all tabs. bsd-user/i386/target_syscall.h bsd-user/x86_64/target_syscall.h crypto/aes.c hw/audio/fmopl.c hw/audio/fmopl.h hw/block/tc58128.c hw/display/cirrus_vga.c hw/display/xenfb.c hw/dma/etraxfs_dma.c hw/intc/sh_intc.c hw/misc/mst_fpga.c hw/net/pcnet.c hw/sh4/sh7750.c hw/timer/m48t59.c hw/timer/sh_timer.c include/crypto/aes.h include/disas/bfd.h include/hw/sh4/sh.h libdecnumber/decNumber.c linux-headers/asm-generic/unistd.h linux-headers/linux/kvm.h linux-user/alpha/target_syscall.h linux-user/arm/nwfpe/double_cpdo.c linux-user/arm/nwfpe/fpa11_cpdt.c linux-user/arm/nwfpe/fpa11_cprt.c linux-user/arm/nwfpe/fpa11.h linux-user/flat.h linux-user/flatload.c linux-user/i386/target_syscall.h linux-user/ppc/target_syscall.h linux-user/sparc/target_syscall.h linux-user/syscall.c linux-user/syscall_defs.h linux-user/x86_64/target_syscall.h slirp/cksum.c slirp/if.c slirp/ip.h slirp/ip_icmp.c slirp/ip_icmp.h slirp/ip_input.c slirp/ip_output.c slirp/mbuf.c slirp/misc.c slirp/sbuf.c slirp/socket.c slirp/socket.h slirp/tcp_input.c slirp/tcpip.h slirp/tcp_output.c slirp/tcp_subr.c slirp/tcp_timer.c slirp/tftp.c slirp/udp.c slirp/udp.h target/cris/cpu.h target/cris/mmu.c target/cris/op_helper.c target/sh4/helper.c target/sh4/op_helper.c target/sh4/translate.c tcg/sparc/tcg-target.inc.c tests/tcg/cris/check_addo.c tests/tcg/cris/check_moveq.c tests/tcg/cris/check_swap.c tests/tcg/multiarch/test-mmap.c ui/vnc-enc-hextile-template.h ui/vnc-enc-zywrle.h util/envlist.c util/readline.c The following have only TABs: bsd-user/i386/target_signal.h bsd-user/sparc64/target_signal.h bsd-user/sparc64/target_syscall.h bsd-user/sparc/target_signal.h bsd-user/sparc/target_syscall.h bsd-user/x86_64/target_signal.h crypto/desrfb.c hw/audio/intel-hda-defs.h hw/core/uboot_image.h hw/sh4/sh7750_regnames.c hw/sh4/sh7750_regs.h include/hw/cris/etraxfs_dma.h linux-user/alpha/termbits.h linux-user/arm/nwfpe/fpopcode.h linux-user/arm/nwfpe/fpsr.h linux-user/arm/syscall_nr.h linux-user/arm/target_signal.h linux-user/cris/target_signal.h linux-user/i386/target_signal.h linux-user/linux_loop.h linux-user/m68k/target_signal.h linux-user/microblaze/target_signal.h linux-user/mips64/target_signal.h linux-user/mips/target_signal.h linux-user/mips/target_syscall.h linux-user/mips/termbits.h linux-user/ppc/target_signal.h linux-user/sh4/target_signal.h linux-user/sh4/termbits.h linux-user/sparc64/target_syscall.h linux-user/sparc/target_signal.h linux-user/x86_64/target_signal.h linux-user/x86_64/termbits.h pc-bios/optionrom/optionrom.h slirp/mbuf.h slirp/misc.h slirp/sbuf.h slirp/tcp.h slirp/tcp_timer.h slirp/tcp_var.h target/i386/svm.h target/sparc/asi.h target/xtensa/core-dc232b/xtensa-modules.inc.c target/xtensa/core-dc233c/xtensa-modules.inc.c target/xtensa/core-de212/core-isa.h target/xtensa/core-de212/xtensa-modules.inc.c target/xtensa/core-fsf/xtensa-modules.inc.c target/xtensa/core-sample_controller/core-isa.h target/xtensa/core-sample_controller/xtensa-modules.inc.c target/xtensa/core-test_kc705_be/core-isa.h target/xtensa/core-test_kc705_be/xtensa-modules.inc.c tests/tcg/cris/check_abs.c tests/tcg/cris/check_addc.c tests/tcg/cris/check_addcm.c tests/tcg/cris/check_addoq.c tests/tcg/cris/check_bound.c tests/tcg/cris/check_ftag.c tests/tcg/cris/check_int64.c tests/tcg/cris/check_lz.c tests/tcg/cris/check_openpf5.c tests/tcg/cris/check_sigalrm.c tests/tcg/cris/crisutils.h tests/tcg/cris/sys.c tests/tcg/i386/test-i386-ssse3.c ui/vgafont.h Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20181213223737.11793-3-pbonzini@redhat.com> Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Acked-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: Eric Blake <eblake@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Stefan Markovic <smarkovic@wavecomp.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14file-posix: Avoid aio_worker() for QEMU_AIO_IOCTLKevin Wolf
aio_worker() doesn't add anything interesting, it's only a useless indirection. Call the handler function directly instead. As we know that this handler function is only called from coroutine context and the coroutine stays around until the worker thread finishes, we can keep RawPosixAIOData on the stack. This was the last user of aio_worker(), so the function goes away now. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14file-posix: Switch to .bdrv_co_ioctlKevin Wolf
No real reason to keep using the callback based mechanism here when the rest of the file-posix driver is coroutine based. Changing it brings ioctls more in line with how other request types work. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14file-posix: Remove paio_submit_co()Kevin Wolf
The function is not used any more, remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14file-posix: Avoid aio_worker() for QEMU_AIO_READ/WRITEKevin Wolf
aio_worker() doesn't add anything interesting, it's only a useless indirection. Call the handler function directly instead. As we know that this handler function is only called from coroutine context and the coroutine stays around until the worker thread finishes, we can keep RawPosixAIOData on the stack. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14file-posix: Move read/write operation logic out of aio_worker()Kevin Wolf
aio_worker() for reads and writes isn't boring enough yet. It still does some postprocessing for handling short reads and turning the result into the right return value. However, there is no reason why handle_aiocb_rw() couldn't do the same, and even without duplicating code between the read and write path. So move the code there. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14file-posix: Avoid aio_worker() for QEMU_AIO_FLUSHKevin Wolf
aio_worker() doesn't add anything interesting, it's only a useless indirection. Call the handler function directly instead. As we know that this handler function is only called from coroutine context and the coroutine stays around until the worker thread finishes, we can keep RawPosixAIOData on the stack. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14file-posix: Avoid aio_worker() for QEMU_AIO_DISCARDKevin Wolf
aio_worker() doesn't add anything interesting, it's only a useless indirection. Call the handler function directly instead. As we know that this handler function is only called from coroutine context and the coroutine stays around until the worker thread finishes, we can keep RawPosixAIOData on the stack. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14file-posix: Avoid aio_worker() for QEMU_AIO_WRITE_ZEROESKevin Wolf
aio_worker() doesn't add anything interesting, it's only a useless indirection. Call the handler function directly instead. As we know that this handler function is only called from coroutine context and the coroutine stays around until the worker thread finishes, we can keep RawPosixAIOData on the stack. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14file-posix: Avoid aio_worker() for QEMU_AIO_COPY_RANGEKevin Wolf
aio_worker() doesn't add anything interesting, it's only a useless indirection. Call the handler function directly instead. As we know that this handler function is only called from coroutine context and the coroutine stays around until the worker thread finishes, we can keep RawPosixAIOData on the stack. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14file-posix: Avoid aio_worker() for QEMU_AIO_TRUNCATEKevin Wolf
aio_worker() doesn't add anything interesting, it's only a useless indirection. Call the handler function directly instead. As we know that this handler function is only called from coroutine context and the coroutine stays around until the worker thread finishes, we can keep RawPosixAIOData on the stack. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14file-posix: Factor out raw_thread_pool_submit()Kevin Wolf
Getting the thread pool of the AioContext of a block node and scheduling some work in it is an operation that is already done twice, and we'll get more instances. Factor it out into a separate function. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-12-14file-posix: Reorganise RawPosixAIODataKevin Wolf
RawPosixAIOData contains a lot of fields for several separate operations that are to be processed in a worker thread and that need different parameters. The struct is currently rather unorganised, with unions that cover some, but not all operations, and even one #define for field names instead of a union. Clean this up to have some common fields and a single union. As a side effect, on x86_64 the struct shrinks from 72 to 48 bytes. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-19file-posix: Fix shared locks on reopen commitMax Reitz
s->locked_shared_perm is the set of bits locked in the file, which is the inverse of the permissions actually shared. So we need to pass them as they are to raw_apply_lock_bytes() instead of inverting them again. Reported-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-12file-posix: Drop s->lock_fdFam Zheng
The lock_fd field is not strictly necessary because transferring locked bytes from old fd to the new one shouldn't fail anyway. This spares the user one fd per image. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-12file-posix: Skip effectiveless OFD lock operationsFam Zheng
If we know we've already locked the bytes, don't do it again; similarly don't unlock a byte if we haven't locked it. This doesn't change the behavior, but fixes a corner case explained below. Libvirt had an error handling bug that an image can get its (ownership, file mode, SELinux) permissions changed (RHBZ 1584982) by mistake behind QEMU. Specifically, an image in use by Libvirt VM has: $ ls -lhZ b.img -rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c600,c690 b.img Trying to attach it a second time won't work because of image locking. And after the error, it becomes: $ ls -lhZ b.img -rw-r--r--. root root system_u:object_r:virt_image_t:s0 b.img Then, we won't be able to do OFD lock operations with the existing fd. In other words, the code such as in blk_detach_dev: blk_set_perm(blk, 0, BLK_PERM_ALL, &error_abort); can abort() QEMU, out of environmental changes. This patch is an easy fix to this and the change is regardlessly reasonable, so do it. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-12file-posix: Use error API properlyFam Zheng
Use error_report for situations that affect user operation (i.e. we're actually returning error), and warn_report/warn_report_err when some less critical error happened but the user operation can still carry on. For raw_normalize_devicepath, add Error parameter to propagate to its callers. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-11-05file-posix: Support auto-read-only optionKevin Wolf
If read-only=off, but auto-read-only=on is given, open the file read-write if we have the permissions, but instead of erroring out for read-only files, just degrade to read-only. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-10-01file-posix: Forbid trying to change unsupported options during reopenAlberto Garcia
The file-posix code is used for the "file", "host_device" and "host_cdrom" drivers, and it allows reopening images. However the only option that is actually processed is "x-check-cache-dropped", and changes in all other options (e.g. "filename") are silently ignored: (qemu) qemu-io virtio0 "reopen -o file.filename=no-such-file" While we could allow changing some of the other options, let's keep things as they are for now but return an error if the user tries to change any of them. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01file-posix: x-check-cache-dropped should default to false on reopenAlberto Garcia
The default value of x-check-cache-dropped is false. There's no reason to use the previous value as a default in raw_reopen_prepare() because bdrv_reopen_queue_child() already takes care of putting the old options in the BDRVReopenState.options QDict. If x-check-cache-dropped was previously set but is now missing from the reopen QDict then it should be reset to false. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01file-posix: Include filename in locking error messageFam Zheng
Image locking errors happening at device initialization time doesn't say which file cannot be locked, for instance, -device scsi-disk,drive=drive-1: Failed to get shared "write" lock Is another process using the image? could refer to either the overlay image or its backing image. Hoist the error_append_hint to the caller of raw_check_lock_bytes where file name is known, and include it in the error hint. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-30file-posix: Fix write_zeroes with unmap on block devicesKevin Wolf
The BLKDISCARD ioctl doesn't guarantee that the discarded blocks read as all-zero afterwards, so don't try to abuse it for zero writing. We try to only use this if BLKDISCARDZEROES tells us that it is safe, but this is unreliable on older kernels and a constant 0 in newer kernels. In other words, this code path is never actually used with newer kernels, so we don't even try to unmap while writing zeros. This patch removes the abuse of discard for writing zeroes from file-posix and instead adds a new function that uses interfaces that are actually meant to deallocate and zero out at the same time. Only if those fail, it falls back to zeroing out without unmap. We never fall back to a discard operation any more that may or may not result in zeros. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-30file-posix: Handle EINTR in preallocation=full writeFam Zheng
Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-24block/file-posix: add bdrv_attach_aio_context callback for host dev and cdromNishanth Aravamudan
In ed6e2161 ("linux-aio: properly bubble up errors from initialzation"), I only added a bdrv_attach_aio_context callback for the bdrv_file driver. There are several other drivers that use the shared aio_plug callback, though, and they will trip the assertion added to aio_get_linux_aio because they did not call aio_setup_linux_aio first. Add the appropriate callback definition to the affected driver definitions. Fixes: ed6e2161 ("linux-aio: properly bubble up errors from initialization") Reported-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Nishanth Aravamudan <naravamudan@digitalocean.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180718211256.29774-1-naravamudan@digitalocean.com Cc: Eric Blake <eblake@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: John Snow <jsnow@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Fam Zheng <famz@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-07-12file-posix: specify expected filetypesJohn Snow
Adjust each caller of raw_open_common to specify if they are expecting host and character devices or not. Tighten expectations of file types upon open in the common code and refuse types that are not expected. This has two effects: (1) Character and block devices are now considered deprecated for the 'file' driver, which expects only S_IFREG, and (2) no file-posix driver (file, host_cdrom, or host_device) can open directories now. I don't think there's a legitimate reason to open directories as if they were files. This prevents QEMU from opening and attempting to probe a directory inode, which can break in exciting ways. One of those ways is lseek on ext4/xfs, which will return 0x7fffffffffffffff as the file size instead of EISDIR. This can coax QEMU into responding with a confusing "file too big" instead of "Hey, that's not a file". See: https://bugs.launchpad.net/qemu/+bug/1739304/ Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches: - Copy offloading fixes for when the copy increases the image size - Temporary revert of the removal of deprecated -drive options - Fix request serialisation in the image fleecing scenario - Fix copy-on-read crash with unaligned image size - Fix another drain crash # gpg: Signature made Tue 10 Jul 2018 16:37:52 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (24 commits) block: Use common write req handling in truncate block: Fix bdrv_co_truncate overlap check block: Use common req handling in copy offloading block: Use common req handling for discard block: Fix handling of image enlarging write block: Extract common write req handling block: Use uint64_t for BdrvTrackedRequest byte fields block: Use BdrvChild to discard block: Add copy offloading trace points block: Prefix file driver trace points with "file_" Revert "block: Remove deprecated -drive geometry options" Revert "block: Remove deprecated -drive option addr" Revert "block: Remove deprecated -drive option serial" Revert "block: Remove dead deprecation warning code" block/blklogwrites: Make sure the log sector size is not too small qapi/block-core.json: Add missing documentation for blklogwrites log-append option block/backup: fix fleecing scheme: use serialized writes block: add BDRV_REQ_SERIALISING flag block: split flags in copy_range block/io: fix copy_range ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-07-10block: Add copy offloading trace pointsFam Zheng
A few trace points that can help reveal what is happening in a copy offloading I/O path. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>