aboutsummaryrefslogtreecommitdiff
path: root/block/file-posix.c
AgeCommit message (Collapse)Author
2018-10-01file-posix: Forbid trying to change unsupported options during reopenAlberto Garcia
The file-posix code is used for the "file", "host_device" and "host_cdrom" drivers, and it allows reopening images. However the only option that is actually processed is "x-check-cache-dropped", and changes in all other options (e.g. "filename") are silently ignored: (qemu) qemu-io virtio0 "reopen -o file.filename=no-such-file" While we could allow changing some of the other options, let's keep things as they are for now but return an error if the user tries to change any of them. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01file-posix: x-check-cache-dropped should default to false on reopenAlberto Garcia
The default value of x-check-cache-dropped is false. There's no reason to use the previous value as a default in raw_reopen_prepare() because bdrv_reopen_queue_child() already takes care of putting the old options in the BDRVReopenState.options QDict. If x-check-cache-dropped was previously set but is now missing from the reopen QDict then it should be reset to false. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-10-01file-posix: Include filename in locking error messageFam Zheng
Image locking errors happening at device initialization time doesn't say which file cannot be locked, for instance, -device scsi-disk,drive=drive-1: Failed to get shared "write" lock Is another process using the image? could refer to either the overlay image or its backing image. Hoist the error_append_hint to the caller of raw_check_lock_bytes where file name is known, and include it in the error hint. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-30file-posix: Fix write_zeroes with unmap on block devicesKevin Wolf
The BLKDISCARD ioctl doesn't guarantee that the discarded blocks read as all-zero afterwards, so don't try to abuse it for zero writing. We try to only use this if BLKDISCARDZEROES tells us that it is safe, but this is unreliable on older kernels and a constant 0 in newer kernels. In other words, this code path is never actually used with newer kernels, so we don't even try to unmap while writing zeros. This patch removes the abuse of discard for writing zeroes from file-posix and instead adds a new function that uses interfaces that are actually meant to deallocate and zero out at the same time. Only if those fail, it falls back to zeroing out without unmap. We never fall back to a discard operation any more that may or may not result in zeros. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-30file-posix: Handle EINTR in preallocation=full writeFam Zheng
Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-24block/file-posix: add bdrv_attach_aio_context callback for host dev and cdromNishanth Aravamudan
In ed6e2161 ("linux-aio: properly bubble up errors from initialzation"), I only added a bdrv_attach_aio_context callback for the bdrv_file driver. There are several other drivers that use the shared aio_plug callback, though, and they will trip the assertion added to aio_get_linux_aio because they did not call aio_setup_linux_aio first. Add the appropriate callback definition to the affected driver definitions. Fixes: ed6e2161 ("linux-aio: properly bubble up errors from initialization") Reported-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Nishanth Aravamudan <naravamudan@digitalocean.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180718211256.29774-1-naravamudan@digitalocean.com Cc: Eric Blake <eblake@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: John Snow <jsnow@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Fam Zheng <famz@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-07-12file-posix: specify expected filetypesJohn Snow
Adjust each caller of raw_open_common to specify if they are expecting host and character devices or not. Tighten expectations of file types upon open in the common code and refuse types that are not expected. This has two effects: (1) Character and block devices are now considered deprecated for the 'file' driver, which expects only S_IFREG, and (2) no file-posix driver (file, host_cdrom, or host_device) can open directories now. I don't think there's a legitimate reason to open directories as if they were files. This prevents QEMU from opening and attempting to probe a directory inode, which can break in exciting ways. One of those ways is lseek on ext4/xfs, which will return 0x7fffffffffffffff as the file size instead of EISDIR. This can coax QEMU into responding with a confusing "file too big" instead of "Hey, that's not a file". See: https://bugs.launchpad.net/qemu/+bug/1739304/ Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches: - Copy offloading fixes for when the copy increases the image size - Temporary revert of the removal of deprecated -drive options - Fix request serialisation in the image fleecing scenario - Fix copy-on-read crash with unaligned image size - Fix another drain crash # gpg: Signature made Tue 10 Jul 2018 16:37:52 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (24 commits) block: Use common write req handling in truncate block: Fix bdrv_co_truncate overlap check block: Use common req handling in copy offloading block: Use common req handling for discard block: Fix handling of image enlarging write block: Extract common write req handling block: Use uint64_t for BdrvTrackedRequest byte fields block: Use BdrvChild to discard block: Add copy offloading trace points block: Prefix file driver trace points with "file_" Revert "block: Remove deprecated -drive geometry options" Revert "block: Remove deprecated -drive option addr" Revert "block: Remove deprecated -drive option serial" Revert "block: Remove dead deprecation warning code" block/blklogwrites: Make sure the log sector size is not too small qapi/block-core.json: Add missing documentation for blklogwrites log-append option block/backup: fix fleecing scheme: use serialized writes block: add BDRV_REQ_SERIALISING flag block: split flags in copy_range block/io: fix copy_range ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-07-10block: Add copy offloading trace pointsFam Zheng
A few trace points that can help reveal what is happening in a copy offloading I/O path. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10block: Prefix file driver trace points with "file_"Fam Zheng
With in one module, trace points usually have a common prefix named after the module name. paio_submit and paio_submit_co are the only two trace points so far in the two file protocol drivers. As we are adding more, having a common prefix here is better so that trace points can be enabled with a glob. Rename them. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-10block: split flags in copy_rangeVladimir Sementsov-Ogievskiy
Pass read flags and write flags separately. This is needed to handle coming BDRV_REQ_NO_SERIALISING clearly in following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-09file-posix: Fix fd_open check in raw_co_copy_range_toFam Zheng
One of them is a typo. But update both to be more readable. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180702025836.20957-3-famz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-07-05file-posix: Unlock FD after creationMax Reitz
Closing the FD does not necessarily mean that it is unlocked. Fix this by relinquishing all permission locks before qemu_close(). Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-07-05file-posix: Fix creation lockingMax Reitz
raw_apply_lock_bytes() takes a bit mask of "permissions that are NOT shared". Also, make the "perm" and "shared" variables uint64_t, because I do not particularly like using ~ on signed integers (and other permission masks are usually uint64_t, too). Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29file-posix: Fix EINTR handlingFam Zheng
EINTR should be checked against errno, not ret. While fixing the bug, collect the branches with a switch block. Also, change the return value from -ENOSTUP to -ENOSPC when the actual issue is request range passes EOF, which should be distinguishable from the case of error == ENOSYS by the caller, so that it could still retry with other byte ranges, whereas it shouldn't retry anymore upon ENOSYS. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29file-posix: Implement co versions of discard/flushKevin Wolf
This simplifies file-posix by implementing the coroutine variants of the discard and flush BlockDriver callbacks. These were the last remaining users of paio_submit(), which can be removed now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-06-29file-posix: Make .bdrv_co_truncate asynchronousKevin Wolf
This moves the code to resize an image file to the thread pool to avoid blocking. Creating large images with preallocation with blockdev-create is now actually a background job instead of blocking the monitor (and most other things) until the preallocation has completed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-29block: Convert .bdrv_truncate callback to coroutine_fnKevin Wolf
bdrv_truncate() is an operation that can block (even for a quite long time, depending on the PreallocMode) in I/O paths that shouldn't block. Convert it to a coroutine_fn so that we have the infrastructure for drivers to make their .bdrv_co_truncate implementation asynchronous. This change could potentially introduce new race conditions because bdrv_truncate() isn't necessarily executed atomically any more. Whether this is a problem needs to be evaluated for each block driver that supports truncate: * file-posix/win32, gluster, iscsi, nfs, rbd, ssh, sheepdog: The protocol drivers are trivially safe because they don't actually yield yet, so there is no change in behaviour. * copy-on-read, crypto, raw-format: Essentially just filter drivers that pass the request to a child node, no problem. * qcow2: The implementation modifies metadata, so it needs to hold s->lock to be safe with concurrent I/O requests. In order to avoid double locking, this requires pulling the locking out into preallocate_co() and using qcow2_write_caches() instead of bdrv_flush(). * qed: Does a single header update, this is fine without locking. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-27linux-aio: properly bubble up errors from initializationNishanth Aravamudan
laio_init() can fail for a couple of reasons, which will lead to a NULL pointer dereference in laio_attach_aio_context(). To solve this, add a aio_setup_linux_aio() function which is called early in raw_open_common. If this fails, propagate the error up. The signature of aio_get_linux_aio() was not modified, because it seems preferable to return the actual errno from the possible failing initialization calls. Additionally, when the AioContext changes, we need to associate a LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context callback and call the new aio_setup_linux_aio(), which will allocate a new AioContext if needed, and return errors on failures. If it fails for any reason, fallback to threaded AIO with an error message, as the device is already in-use by the guest. Add an assert that aio_get_linux_aio() cannot return NULL. Signed-off-by: Nishanth Aravamudan <naravamudan@digitalocean.com> Message-id: 20180622193700.6523-1-naravamudan@digitalocean.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-11block/file-posix: File locking during creationMax Reitz
When creating a file, we should take the WRITE and RESIZE permissions. We do not need either for the creation itself, but we do need them for clearing and resizing it. So we can take the proper permissions by replacing O_TRUNC with an explicit truncation to 0, and by taking the appropriate file locks between those two steps. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180509215336.31304-3-mreitz@redhat.com Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-11block/file-posix: Pass FD to locking helpersMax Reitz
raw_apply_lock_bytes() and raw_check_lock_bytes() currently take a BDRVRawState *, but they only use the lock_fd field. During image creation, we do not have a BDRVRawState, but we do have an FD; so if we want to reuse the functions there, we should modify them to receive only the FD. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180509215336.31304-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-01file-posix: Implement bdrv_co_copy_rangeFam Zheng
With copy_file_range(2), we can implement the bdrv_co_copy_range semantics. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20180601092648.24614-6-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-05-11block/file-posix: add x-check-page-cache=on|off optionStefan Hajnoczi
mincore(2) checks whether pages are resident. Use it to verify that page cache has been dropped. You can trigger a verification failure by mmapping the image file from another process that loads a byte from a page, forcing it to become resident. bdrv_co_invalidate_cache() will fail while that process is alive. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180427162312.18583-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-05-11block/file-posix: implement bdrv_co_invalidate_cache() on LinuxStefan Hajnoczi
On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*. Use this to drop page cache on the destination host during shared storage migration. This way the destination host will read the latest copy of the data and will not use stale data from the page cache. The flow is as follows: 1. Source host writes out all dirty pages and inactivates drives. 2. QEMU_VM_EOF is sent on migration stream. 3. Destination host invalidates caches before accessing drives. This patch enables live migration even with -drive cache.direct=off. * Terms and conditions may apply, please see patch for details. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180427162312.18583-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-04-03block/file-posix: Fix fully preallocated truncateMax Reitz
Storing the lseek() result in an int results in it overflowing when the file is at least 2 GB big. Then, we have a 50 % chance of the result being "negative" and thus thinking an error occurred when actually everything went just fine. So we should use the correct type for storing the result: off_t. Reported-by: Daniel P. Berrange <berrange@redhat.com> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549231 Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180228131315.30194-2-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-04-03block: handle invalid lseek returns gracefullyJeff Cody
In commit 223a23c198787328ae75bc65d84edf5fde33c0b6, we implemented a workaround in the gluster driver to handle invalid values returned for SEEK_DATA or SEEK_HOLE. In some instances, these same invalid values can be seen in the posix file handler as well - for example, it has been reported on FUSE gluster mounts. Calling assert() for these invalid values is overly harsh; we can safely return -EIO and allow this case to be treated as a "learned nothing" case (e.g., D4 / H4, as commented in the code). This patch does the same thing that 223a23c198787 did for gluster.c, except in file-posix.c Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-09file-posix: Fix no-op bdrv_truncate() with falloc preallocationKevin Wolf
If bdrv_truncate() is called, but the requested size is the same as before, don't call posix_fallocate(), which returns -EINVAL for length zero and would therefore make bdrv_truncate() fail. The problem can be triggered by creating a zero-sized raw image with 'falloc' preallocation mode. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-09file-posix: Support .bdrv_co_createKevin Wolf
This adds the .bdrv_co_create driver callback to file, which enables image creation over QMP. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-03-02block: rename .bdrv_create() to .bdrv_co_create_opts()Stefan Hajnoczi
BlockDriver->bdrv_create() has been called from coroutine context since commit 5b7e1542cfa41a281af9629d31cef03704d976e6 ("block: make bdrv_create adopt coroutine"). Make this explicit by renaming to .bdrv_co_create_opts() and add the coroutine_fn annotation. This makes it obvious to block driver authors that they may yield, use CoMutex, or other coroutine_fn APIs. bdrv_co_create is reserved for the QAPI-based version that Kevin is working on. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170705102231.20711-2-stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-03-02file-posix: Switch to .bdrv_co_block_status()Eric Blake
We are gradually moving away from sector-based interfaces, towards byte-based. Update the file protocol driver accordingly. In want_zero mode, we continue to report fine-grained hole information (the caller wants as much mapping detail as possible); but when not in that mode, the caller prefers larger *pnum and merely cares about what offsets are allocated at this layer, rather than where the holes live. Since holes still read as zeroes at this layer (rather than deferring to a backing layer), we can take the shortcut of skipping lseek(), and merely state that all bytes are allocated. We can also drop redundant bounds checks that are already guaranteed by the block layer. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-02-09block: Simplify bdrv_can_write_zeroes_with_unmap()Eric Blake
We don't need the can_write_zeroes_with_unmap field in BlockDriverInfo, because it is redundant information with supported_zero_flags & BDRV_REQ_MAY_UNMAP. Note that BlockDriverInfo and supported_zero_flags are both per-device settings, rather than global state about the driver as a whole, which means one or both of these bits of information can already be conditional. Let's audit how they were set: crypto: always setting can_write_ to false is pointless (the struct starts life zero-initialized), no use of supported_ nbd: just recently fixed to set can_write_ if supported_ includes MAY_UNMAP (thus this commit effectively reverts bca80059e and solves the problem mentioned there in a more global way) file-posix, iscsi, qcow2: can_write_ is conditional, while supported_ was unconditional; but passing MAY_UNMAP would fail with ENOTSUP if the condition wasn't met qed: can_write_ is unconditional, but pwrite_zeroes lacks support for MAY_UNMAP and supported_ is not set. Perhaps support can be added later (since it would be similar to qcow2), but for now claiming false is no real loss all other drivers: can_write_ is not set, and supported_ is either unset or a passthrough Simplify the code by moving the conditional into supported_zero_flags for all drivers, then dropping the now-unused BDI field. For callers that relied on bdrv_can_write_zeroes_with_unmap(), we return the same per-device settings for drivers that had conditions (no observable change in behavior there); and can now return true (instead of false) for drivers that support passthrough (for example, the commit driver) which gives those drivers the same fix as nbd just got in bca80059e. For callers that relied on supported_zero_flags, we now have a few more places that can avoid a wasted call to pwrite_zeroes() that will just fail with ENOTSUP. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180126193439.20219-1-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-02-09Move include qemu/option.h from qemu-common.h to actual usersMarkus Armbruster
qemu-common.h includes qemu/option.h, but most places that include the former don't actually need the latter. Drop the include, and add it to the places that actually need it. While there, drop superfluous includes of both headers, and separate #include from file comment with a blank line. This cleanup makes the number of objects depending on qemu/option.h drop from 4545 (out of 4743) to 284 in my "build everything" tree. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-20-armbru@redhat.com> [Semantic conflict with commit bdd6a90a9e in block/nvme.c resolved]
2018-02-09Include qapi/qmp/qdict.h exactly where neededMarkus Armbruster
This cleanup makes the number of objects depending on qapi/qmp/qdict.h drop from 4550 (out of 4743) to 368 in my "build everything" tree. For qapi/qmp/qobject.h, the number drops from 4552 to 390. While there, separate #include from file comment with a blank line. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-13-armbru@redhat.com>
2017-09-26file-posix: Clear out first sector in hdev_createFam Zheng
People get surprised when, after "qemu-img create -f raw /dev/sdX", they still see qcow2 with "qemu-img info", if previously the bdev had a qcow2 header. While this is natural because raw doesn't need to write any magic bytes during creation, hdev_create is free to clear out the first sector to make sure the stale qcow2 header doesn't cause such confusion. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-22scsi, file-posix: add support for persistent reservation managementPaolo Bonzini
It is a common requirement for virtual machine to send persistent reservations, but this currently requires either running QEMU with CAP_SYS_RAWIO, or using out-of-tree patches that let an unprivileged QEMU bypass Linux's filter on SG_IO commands. As an alternative mechanism, the next patches will introduce a privileged helper to run persistent reservation commands without expanding QEMU's attack surface unnecessarily. The helper is invoked through a "pr-manager" QOM object, to which file-posix.c passes SG_IO requests for PERSISTENT RESERVE OUT and PERSISTENT RESERVE IN commands. For example: $ qemu-system-x86_64 -device virtio-scsi \ -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock -drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0 -device scsi-block,drive=hd or: $ qemu-system-x86_64 -device virtio-scsi \ -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock -blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0 -device scsi-block,drive=hd Multiple pr-manager implementations are conceivable and possible, though only one is implemented right now. For example, a pr-manager could: - talk directly to the multipath daemon from a privileged QEMU (i.e. QEMU links to libmpathpersist); this makes reservation work properly with multipath, but still requires CAP_SYS_RAWIO - use the Linux IOC_PR_* ioctls (they require CAP_SYS_ADMIN though) - more interestingly, implement reservations directly in QEMU through file system locks or a shared database (e.g. sqlite) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-04qapi: Change data type of the FOO_lookup generated for enum FOOMarc-André Lureau
Currently, a FOO_lookup is an array of strings terminated by a NULL sentinel. A future patch will generate enums with "holes". NULL-termination will cease to work then. To prepare for that, store the length in the FOO_lookup by wrapping it in a struct and adding a member for the length. The sentinel will be dropped next. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170822132255.23945-13-marcandre.lureau@redhat.com> [Basically redone] Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-16-git-send-email-armbru@redhat.com> [Rebased]
2017-09-04qapi: Mechanically convert FOO_lookup[...] to FOO_str(...)Markus Armbruster
Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-14-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-09-04qapi: Generate FOO_str() macro for QAPI enum FOOMarkus Armbruster
The next commit will put it to use. May look pointless now, but we're going to change the FOO_lookup's type, and then it'll help. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-13-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-09-04qapi: Drop superfluous qapi_enum_parse() parameter maxMarkus Armbruster
The lookup tables have a sentinel, no need to make callers pass their size. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-3-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [Rebased, commit message corrected]
2017-08-11file-posix: Do runtime check for ofd lock APIFam Zheng
It is reported that on Windows Subsystem for Linux, ofd operations fail with -EINVAL. In other words, QEMU binary built with system headers that exports F_OFD_SETLK doesn't necessarily run in an environment that actually supports it: $ qemu-system-aarch64 ... -drive file=test.vhdx,if=none,id=hd0 \ -device virtio-blk-pci,drive=hd0 qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock byte 100 qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to unlock byte 100 qemu-system-aarch64: -drive file=test.vhdx,if=none,id=hd0: Failed to lock byte 100 As a matter of fact this is not WSL specific. It can happen when running a QEMU compiled against a newer glibc on an older kernel, such as in a containerized environment. Let's do a runtime check to cope with that. Reported-by: Andrew Baumann <Andrew.Baumann@microsoft.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-08-08block: respect error code from bdrv_getlength in handle_aiocb_write_zeroesDenis V. Lunev
Original idea beyond the code in question was the following: we have failed to write zeroes with fallocate(FALLOC_FL_ZERO_RANGE) as the simplest approach and via fallocate(FALLOC_FL_PUNCH_HOLE)/fallocate(0). We have the only chance now: if the request comes beyond end of the file. Thus we should calculate file length and respect the error code from that op. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Markus Armbruster <armbru@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-07-11block/file-posix: Preallocation for truncateMax Reitz
By using raw_regular_truncate() in raw_truncate(), we can now easily support preallocation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-9-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11block/file-posix: Generalize raw_regular_truncateMax Reitz
Currently, raw_regular_truncate() is intended for setting the size of a newly created file. However, we also want to use it for truncating an existing file in which case only the newly added space (when growing) should be preallocated. This also means that if resizing failed, we should try to restore the original file size. This is important when using preallocation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-8-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11block/file-posix: Extract raw_regular_truncate()Max Reitz
This functionality is part of raw_create() which we will be able to reuse nicely in raw_truncate(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20170613202107.10125-7-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11block/file-posix: Small fixes in raw_create()Max Reitz
Variables should be declared at the start of a block, and if a certain parameter value is not supported it may be better to return -ENOTSUP instead of -EINVAL. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20170613202107.10125-6-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-07-11block: Add PreallocMode to BD.bdrv_truncate()Max Reitz
Add a PreallocMode parameter to the bdrv_truncate() function implemented by each block driver. Currently, we always pass PREALLOC_MODE_OFF and no driver accepts anything else. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170613202107.10125-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-06-26block: change variable names in BlockDriverStateManos Pitsidianakis
Change the 'int count' parameter in *pwrite_zeros, *pdiscard related functions (and some others) to 'int bytes', as they both refer to bytes. This helps with code legibility. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-29block/file-*: *_parse_filename() and colonsMax Reitz
The file drivers' *_parse_filename() implementations just strip the optional protocol prefix off the filename. However, for e.g. "file:foo:bar", this would lead to "foo:bar" being stored as the BDS's filename which looks like it should be managed using the "foo" protocol. This is especially troublesome if you then try to resolve a backing filename based on "foo:bar". This issue can only occur if the stripped part is a relative filename ("file:/foo:bar" will be shortened to "/foo:bar" and having a slash before the first colon means that "/foo" is not recognized as a protocol part). Therefore, we can easily fix it by prepending "./" to such filenames. Before this patch: $ ./qemu-img create -f qcow2 backing.qcow2 64M Formatting 'backing.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ ./qemu-img create -f qcow2 -b backing.qcow2 file:top:image.qcow2 Formatting 'file:top:image.qcow2', fmt=qcow2 size=67108864 backing_file=backing.qcow2 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ ./qemu-io file:top:image.qcow2 can't open device file:top:image.qcow2: Could not open backing file: Unknown protocol 'top' After this patch: $ ./qemu-io file:top:image.qcow2 [no error] Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170522195217.12991-3-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-05-11file-posix: Remove .bdrv_inactivate/invalidate_cacheKevin Wolf
Now that the block layer takes care to request a lot less permissions for inactive nodes, the special-casing in file-posix isn't necessary any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-05-11file-posix: Add image locking to perm operationsFam Zheng
This extends the permission bits of op blocker API to external using Linux OFD locks. Each permission in @perm and @shared_perm is represented by a locked byte in the image file. Requesting a permission in @perm is translated to a shared lock of the corresponding byte; rejecting to share the same permission is translated to a shared lock of a separate byte. With that, we use 2x number of bytes of distinct permission types. virtlockd in libvirt locks the first byte, so we do locking from a higher offset. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>