aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2023-08-30block/io: align requests to subcluster_sizeAndrey Drobyshev
When target image is using subclusters, and we align the request during copy-on-read, it makes sense to align to subcluster_size rather than cluster_size. Otherwise we end up with unnecessary allocations. This commit renames bdrv_round_to_clusters() to bdrv_round_to_subclusters() and utilizes subcluster_size field of BlockDriverInfo to make necessary alignments. It affects copy-on-read as well as mirror job (which is using bdrv_round_to_clusters()). This change also fixes the following bug with failing assert (covered by the test in the subsequent commit): qemu-img create -f qcow2 base.qcow2 64K qemu-img create -f qcow2 -o extended_l2=on,backing_file=base.qcow2,backing_fmt=qcow2 img.qcow2 64K qemu-io -c "write -P 0xaa 0 2K" img.qcow2 qemu-io -C -c "read -P 0x00 2K 62K" img.qcow2 qemu-io: ../block/io.c:1236: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < pnum' failed. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230711172553.234055-3-andrey.drobyshev@virtuozzo.com>
2023-08-30block: add subcluster_size field to BlockDriverInfoAndrey Drobyshev
This is going to be used in the subsequent commit as requests alignment (in particular, during copy-on-read). This value only makes sense for the formats which support subclusters (currently QCOW2 only). If this field isn't set by driver's own bdrv_get_info() implementation, we simply set it equal to the cluster size thus treating each cluster as having a single subcluster. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230711172553.234055-2-andrey.drobyshev@virtuozzo.com>
2023-08-03block/blkio: add more comments on the fd passing handlingStefano Garzarella
As Hanna pointed out, it is not clear in the code why qemu_open() can fail, and why blkio_set_int("fd") is not enough to discover the `fd` property support. Let's fix them by adding more details in the code comments. Suggested-by: Hanna Czenczek <hreitz@redhat.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20230803082825.25293-3-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-08-03block/blkio: close the fd when blkio_connect() failsStefano Garzarella
libblkio drivers take ownership of `fd` only after a successful blkio_connect(), so if it fails, we are still the owners. Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for virtio-blk") Suggested-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Message-id: 20230803082825.25293-2-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-27Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into stagingRichard Henderson
Pull request Please include these bug fixes in QEMU 8.1. Thanks! # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmTCzPUACgkQnKSrs4Gr # c8g1DAf/fPUQ4zRsCn079pHIyK9TFo4COm23p4kiusxj8otfjt8LH1Zsc9pGWC2+ # bl2RlnPID8JlyJFDRN7b/RCEhj45a83GtCmhDDmqVgy1eO5vwOKm2XyyWeD+pq/U # Hf2QLPLZZ7tCD8Njpty+gB3Ux4zqthKGXSg8FpJ3w0tl4me2efLvjMa6jHMwtnHT # aAbyQ3WMpT9w4XHLqRQDHzBqrTSY4od3nl9SrM/DQ2klLIcz8ECTEZVBY9B3pq6m # QvAg24tfb0QvS14YnZv/PMCfOaVuE87M9G4f93pCynnMxMYze+XczL0sGhIAS9wp # 03NgGlhGumOix6r2kHjlG6p3xywV8A== # =jMf8 # -----END PGP SIGNATURE----- # gpg: Signature made Thu 27 Jul 2023 01:00:53 PM PDT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] * tag 'block-pull-request' of https://gitlab.com/stefanha/qemu: block/blkio: use blkio_set_int("fd") to check fd support block/blkio: fall back on using `path` when `fd` setting fails block/blkio: retry blkio_connect() if it fails using `fd` block/blkio: move blkio_connect() in the drivers functions block: Fix pad_request's request restriction block/file-posix: fix g_file_get_contents return path block/blkio: do not use open flags in qemu_open() block/blkio: enable the completion eventfd Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-27block/blkio: use blkio_set_int("fd") to check fd supportStefano Garzarella
Setting the `fd` property fails with virtio-blk-* libblkio drivers that do not support fd passing since https://gitlab.com/libblkio/libblkio/-/merge_requests/208. Getting the `fd` property, on the other hand, always succeeds for virtio-blk-* libblkio drivers even when they don't support fd passing. This patch switches to setting the `fd` property because it is a better mechanism for probing fd passing support than getting the `fd` property. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20230727161020.84213-5-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-27block/blkio: fall back on using `path` when `fd` setting failsStefano Garzarella
qemu_open() fails if called with an unix domain socket in this way: -blockdev node-name=drive0,driver=virtio-blk-vhost-user,path=vhost-user-blk.sock,cache.direct=on: Could not open 'vhost-user-blk.sock': No such device or address Since virtio-blk-vhost-user does not support fd passing, let`s always fall back on using `path` if we fail the fd passing. Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for virtio-blk") Reported-by: Qing Wang <qinwang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20230727161020.84213-4-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-27block/blkio: retry blkio_connect() if it fails using `fd`Stefano Garzarella
libblkio 1.3.0 added support of "fd" property for virtio-blk-vhost-vdpa driver. In QEMU, starting from commit cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for virtio-blk") we are using `blkio_get_int(..., "fd")` to check if the "fd" property is supported for all the virtio-blk-* driver. Unfortunately that property is also available for those driver that do not support it, such as virtio-blk-vhost-user. So, `blkio_get_int()` is not enough to check whether the driver supports the `fd` property or not. This is because the virito-blk common libblkio driver only checks whether or not `fd` is set during `blkio_connect()` and fails with -EINVAL for those transports that do not support it (all except vhost-vdpa for now). So let's handle the `blkio_connect()` failure, retrying it using `path` directly. Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for virtio-blk") Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20230727161020.84213-3-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-27block/blkio: move blkio_connect() in the drivers functionsStefano Garzarella
This is in preparation for the next patch, where for virtio-blk drivers we need to handle the failure of blkio_connect(). Let's also rename the *_open() functions to *_connect() to make the code reflect the changes applied. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20230727161020.84213-2-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-27block: Fix pad_request's request restrictionHanna Czenczek
bdrv_pad_request() relies on requests' lengths not to exceed SIZE_MAX, which bdrv_check_qiov_request() does not guarantee. bdrv_check_request32() however will guarantee this, and both of bdrv_pad_request()'s callers (bdrv_co_preadv_part() and bdrv_co_pwritev_part()) already run it before calling bdrv_pad_request(). Therefore, bdrv_pad_request() can safely call bdrv_check_request32() without expecting error, too. In effect, this patch will not change guest-visible behavior. It is a clean-up to tighten a condition to match what is guaranteed by our callers, and which exists purely to show clearly why the subsequent assertion (`assert(*bytes <= SIZE_MAX)`) is always true. Note there is a difference between the interfaces of bdrv_check_qiov_request() and bdrv_check_request32(): The former takes an errp, the latter does not, so we can no longer just pass &error_abort. Instead, we need to check the returned value. While we do expect success (because the callers have already run this function), an assert(ret == 0) is not much simpler than just to return an error if it occurs, so let us handle errors by returning them up the stack now. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-id: 20230714085938.202730-1-hreitz@redhat.com Fixes: 18743311b829cafc1737a5f20bc3248d5f91ee2a ("block: Collapse padded I/O vecs exceeding IOV_MAX") Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-27block/file-posix: fix g_file_get_contents return pathSam Li
The g_file_get_contents() function returns a g_boolean. If it fails, the returned value will be 0 instead of -1. Solve the issue by skipping assigning ret value. This issue was found by Matthew Rosato using virtio-blk-{pci,ccw} backed by an NVMe partition e.g. /dev/nvme0n1p1 on s390x. Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-id: 20230727115844.8480-1-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-26block/blkio: do not use open flags in qemu_open()Stefano Garzarella
qemu_open() in blkio_virtio_blk_common_open() is used to open the character device (e.g. /dev/vhost-vdpa-0 or /dev/vfio/vfio) or in the future eventually the unix socket. In all these cases we cannot open the path in read-only mode, when the `read-only` option of blockdev is on, because the exchange of IOCTL commands for example will fail. In order to open the device read-only, we have to use the `read-only` property of the libblkio driver as we already do in blkio_file_open(). Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for virtio-blk") Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2225439 Reported-by: Qing Wang <qinwang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-id: 20230726074807.14041-1-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-25block/blkio: enable the completion eventfdStefano Garzarella
Until libblkio 1.3.0, virtio-blk drivers had completion eventfd notifications enabled from the start, but from the next releases this is no longer the case, so we have to explicitly enable them. In fact, the libblkio documentation says they could be disabled, so we should always enable them at the start if we want to be sure to get completion eventfd notifications: By default, the driver might not generate completion events for requests so it is necessary to explicitly enable the completion file descriptor before use: void blkioq_set_completion_fd_enabled(struct blkioq *q, bool enable); I discovered this while trying a development version of libblkio: the guest kernel hangs during boot, while probing the device. Fixes: fd66dbd424f5 ("blkio: add libblkio block driver") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20230725103744.77343-1-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-19nbd/client: Simplify cookie vs. index computationEric Blake
Our code relies on a sentinel cookie value of zero for deciding when a packet has been handled, as well as relying on array indices between 0 and MAX_NBD_REQUESTS-1 for dereferencing purposes. As long as we can symmetrically convert between two forms, there is no reason to go with the odd choice of using XOR with a random pointer, when we can instead simplify the mappings with a mere offset of 1. Using ((uint64_t)-1) as the sentinel instead of NULL such that the two macros could be entirely eliminated might also be possible, but would require a more careful audit to find places where we currently rely on zero-initialization to be interpreted as the sentinel value, so I did not pursue that course. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230608135653.2918540-7-eblake@redhat.com> [eblake: enhance commit message] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-07-19nbd: s/handle/cookie/ to match NBD specEric Blake
Externally, libnbd exposed the 64-bit opaque marker for each client NBD packet as the "cookie", because it was less confusing when contrasted with 'struct nbd_handle *' holding all libnbd state. It also avoids confusion between the noun 'handle' as a way to identify a packet and the verb 'handle' for reacting to things like signals. Upstream NBD changed their spec to favor the name "cookie" based on libnbd's recommendations[1], so we can do likewise. [1] https://github.com/NetworkBlockDevice/nbd/commit/ca4392eb2b Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230608135653.2918540-6-eblake@redhat.com> [eblake: typo fix] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-07-17block/nvme: invoke blk_io_plug_call() outside q->lockStefan Hajnoczi
blk_io_plug_call() is invoked outside a blk_io_plug()/blk_io_unplug() section while opening the NVMe drive from: nvme_file_open() -> nvme_init() -> nvme_identify() -> nvme_admin_cmd_sync() -> nvme_submit_command() -> blk_io_plug_call() blk_io_plug_call() immediately invokes the given callback when the current thread is not plugged, as is the case during nvme_file_open(). Unfortunately, nvme_submit_command() calls blk_io_plug_call() with q->lock still held: ... q->sq.tail = (q->sq.tail + 1) % NVME_QUEUE_SIZE; q->need_kick++; blk_io_plug_call(nvme_unplug_fn, q); qemu_mutex_unlock(&q->lock); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ nvme_unplug_fn() deadlocks trying to acquire q->lock because the lock is already acquired by the same thread. The symptom is that QEMU hangs during startup while opening the NVMe drive. Fix this by moving the blk_io_plug_call() outside q->lock. This is safe because no other thread runs code related to this queue and blk_io_plug_call()'s internal state is immune to thread safety issues since it is thread-local. Reported-by: Lukáš Doktor <ldoktor@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Lukas Doktor <ldoktor@redhat.com> Message-id: 20230712191628.252806-1-stefanha@redhat.com Fixes: f2e590002bd6 ("block/nvme: convert to blk_io_plug_call() API") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-07-04block/blkio: fix module_block.py parsingStefan Hajnoczi
When QEMU is built with --enable-modules, the module_block.py script parses block/*.c to find block drivers that are built as modules. The script generates a table of block drivers called block_driver_modules[]. This table is used for block driver module loading. The blkio.c driver uses macros to define its BlockDriver structs. This was done to avoid code duplication but the module_block.py script is unable to parse the macro. The result is that libblkio-based block drivers can be built as modules but will not be found at runtime. One fix is to make the module_block.py script or build system fancier so it can parse C macros (e.g. by parsing the preprocessed source code). I chose not to do this because it raises the complexity of the build, making future issues harder to debug. Keep things simple: use the macro to avoid duplicating BlockDriver function pointers but define .format_name and .protocol_name manually for each BlockDriver. This way the module_block.py is able to parse the code. Also get rid of the block driver name macros (e.g. DRIVER_IO_URING) because module_block.py cannot parse them either. Fixes: fd66dbd424f5 ("blkio: add libblkio block driver") Reported-by: Qing Wang <qinwang@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20230704123436.187761-1-stefanha@redhat.com Cc: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-28Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into stagingRichard Henderson
Block layer patches - Re-enable the graph lock - More fixes to coroutine_fn marking # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmScQCQRHGt3b2xmQHJl # ZGhhdC5jb20ACgkQfwmycsiPL9bNSA//WIzPT45rFhl2U9QgyOJu26ho6ahsgwgI # Z3QM5kCDB1dAN9USRPxhGboLGo8CyY7eeSwSrR7RtwBGYrWrAoJfGp5gK/7d9s5Q # o0AGgRPnJGhFkBhRRMytsDsewM6Kk4IRmk4HMK3cOH3rsSM8RHs6KmDSBKesllu0 # QVGf3qW4u8LHyZyGM5OlPVUbtuDuK6/52FGhpXBp+x4oyNegOhjwO4mGOvTG+xIk # Q5zwWZaPfjxaEDkvW8iahB6/D7Tpt64BmMf1Ydhxcd5eKEp932CiBI36aAlNKoRD # Al5wztRx1GEh12ekN39jIi7Ypp3JX26keJcieKU0q656pT551UFRYjU0Rk08/Cca # qv2oiQDu6bHgQ9zCQ1nMfa9+K2MyBwx0b5qfYkvs2RzgCTl8ImgBQANHfw8tz6Bq # HUo1zsFBXCaK0boUB5iFwdf3rlx3t9UTEuDej/RaHqZjZD5xeG/smCcOlSfHaKUa # wXfYxvm8ZfefJn1D6io1A+7M956uvIQNtmh13cU44clgFX9Y/bBNMg/5lMRsJKo8 # xxjvqCAyxo/pPfUsVWx4pc8AXbfVa85gyoSiaLEYZnqP54sJ2lFccqykCsTy58Lo # VDcoPnoSc+LNqBOvtzxXgQbEWFCXU6fe0+TZgVYUvExWFIAOImeDWg2GD1JVrwsX # e9QrPhL3DXg= # =ZQcP # -----END PGP SIGNATURE----- # gpg: Signature made Wed 28 Jun 2023 04:13:56 PM CEST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] * tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (23 commits) block: use bdrv_co_debug_event in coroutine context block: use bdrv_co_getlength in coroutine context qcow2: mark more functions as coroutine_fns and GRAPH_RDLOCK vhdx: mark more functions as coroutine_fns and GRAPH_RDLOCK vmdk: mark more functions as coroutine_fns and GRAPH_RDLOCK dmg: mark more functions as coroutine_fns and GRAPH_RDLOCK cloop: mark more functions as coroutine_fns and GRAPH_RDLOCK block: mark another function as coroutine_fns and GRAPH_UNLOCKED bochs: mark more functions as coroutine_fns and GRAPH_RDLOCK vpc: mark more functions as coroutine_fns and GRAPH_RDLOCK qed: mark more functions as coroutine_fns and GRAPH_RDLOCK file-posix: remove incorrect coroutine_fn calls Revert "graph-lock: Disable locking for now" graph-lock: Unlock the AioContext while polling blockjob: Fix AioContext locking in block_job_add_bdrv() block: Fix AioContext locking in bdrv_open_backing_file() block: Fix AioContext locking in bdrv_open_inherit() block: Fix AioContext locking in bdrv_reopen_parse_file_or_backing() block: Fix AioContext locking in bdrv_attach_child_common() block: Fix AioContext locking in bdrv_open_child() ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-28block: use bdrv_co_debug_event in coroutine contextPaolo Bonzini
bdrv_co_debug_event was recently introduced, with bdrv_debug_event becoming a wrapper for use in unknown context. Because most of the time bdrv_debug_event is used on a BdrvChild via the wrapper macro BLKDBG_EVENT, introduce a similar macro BLKDBG_CO_EVENT that calls bdrv_co_debug_event, and switch whenever possible. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230601115145.196465-13-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28block: use bdrv_co_getlength in coroutine contextPaolo Bonzini
bdrv_co_getlength was recently introduced, with bdrv_getlength becoming a wrapper for use in unknown context. Switch to bdrv_co_getlength when possible. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230601115145.196465-12-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28qcow2: mark more functions as coroutine_fns and GRAPH_RDLOCKPaolo Bonzini
Mark functions as coroutine_fn when they are only called by other coroutine_fns and they can suspend. Change calls to co_wrappers to use the non-wrapped functions, which in turn requires adding GRAPH_RDLOCK annotations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230601115145.196465-11-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28vhdx: mark more functions as coroutine_fns and GRAPH_RDLOCKPaolo Bonzini
Mark functions as coroutine_fn when they are only called by other coroutine_fns and they can suspend. Change calls to co_wrappers to use the non-wrapped functions, which in turn requires adding GRAPH_RDLOCK annotations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230601115145.196465-10-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28vmdk: mark more functions as coroutine_fns and GRAPH_RDLOCKPaolo Bonzini
Mark functions as coroutine_fn when they are only called by other coroutine_fns and they can suspend. Change calls to co_wrappers to use the non-wrapped functions, which in turn requires adding GRAPH_RDLOCK annotations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230601115145.196465-9-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28dmg: mark more functions as coroutine_fns and GRAPH_RDLOCKPaolo Bonzini
Mark functions as coroutine_fn when they are only called by other coroutine_fns and they can suspend. Change calls to co_wrappers to use the non-wrapped functions, which in turn requires adding GRAPH_RDLOCK annotations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230601115145.196465-8-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28cloop: mark more functions as coroutine_fns and GRAPH_RDLOCKPaolo Bonzini
Mark functions as coroutine_fn when they are only called by other coroutine_fns and they can suspend. Change calls to co_wrappers to use the non-wrapped functions, which in turn requires adding GRAPH_RDLOCK annotations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230601115145.196465-7-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28bochs: mark more functions as coroutine_fns and GRAPH_RDLOCKPaolo Bonzini
Mark functions as coroutine_fn when they are only called by other coroutine_fns and they can suspend. Change calls to co_wrappers to use the non-wrapped functions, which in turn requires adding GRAPH_RDLOCK annotations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230601115145.196465-5-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28vpc: mark more functions as coroutine_fns and GRAPH_RDLOCKPaolo Bonzini
Mark functions as coroutine_fn when they are only called by other coroutine_fns and they can suspend. Change calls to co_wrappers to use the non-wrapped functions, which in turn requires adding GRAPH_RDLOCK annotations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230601115145.196465-4-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28qed: mark more functions as coroutine_fns and GRAPH_RDLOCKPaolo Bonzini
Mark functions as coroutine_fn when they are only called by other coroutine_fns and they can suspend. Change calls to co_wrappers to use the non-wrapped functions, which in turn requires adding GRAPH_RDLOCK annotations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230601115145.196465-3-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28file-posix: remove incorrect coroutine_fn callsPaolo Bonzini
raw_co_getlength is called by handle_aiocb_write_zeroes, which is not a coroutine function. This is harmless because raw_co_getlength does not actually suspend, but in the interest of clarity make it a non-coroutine_fn that is just wrapped by the coroutine_fn raw_co_getlength. Likewise, check_cache_dropped was only a coroutine_fn because it called raw_co_getlength, so it can be made non-coroutine as well. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230601115145.196465-2-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28Revert "graph-lock: Disable locking for now"Kevin Wolf
Now that bdrv_graph_wrlock() temporarily drops the AioContext lock that its caller holds, it can poll without causing deadlocks. We can now re-enable graph locking. This reverts commit ad128dff0bf4b6f971d05eb4335a627883a19c1d. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230605085711.21261-12-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28graph-lock: Unlock the AioContext while pollingKevin Wolf
If the caller keeps the AioContext lock for a block node in an iothread, polling in bdrv_graph_wrlock() deadlocks if the condition isn't fulfilled immediately. Now that all callers make sure to actually have the AioContext locked when they call bdrv_replace_child_noperm() like they should, we can change bdrv_graph_wrlock() to take a BlockDriverState whose AioContext lock the caller holds (NULL if it doesn't) and unlock it temporarily while polling. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230605085711.21261-11-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-26vhost-user: fully use new backend/frontend namingManos Pitsidianakis
Slave/master nomenclature was replaced with backend/frontend in commit 1fc19b65279a ("vhost-user: Adopt new backend naming") This patch replaces all remaining uses of master and slave in the codebase. Signed-off-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org> Message-Id: <20230613080849.2115347-1-manos.pitsidianakis@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2023-06-20meson: Replace softmmu_ss -> system_ssPhilippe Mathieu-Daudé
We use the user_ss[] array to hold the user emulation sources, and the softmmu_ss[] array to hold the system emulation ones. Hold the latter in the 'system_ss[]' array for parity with user emulation. Mechanical change doing: $ sed -i -e s/softmmu_ss/system_ss/g $(git grep -l softmmu_ss) Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230613133347.82210-10-philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-06Merge tag 'pull-request-2023-06-06' of https://gitlab.com/thuth/qemu into ↵Richard Henderson
staging * Fix emulated LCCB, LOCFHR, MXDB and MXDBR s390x instructions * Fix the malta machine on s390x (big endian) hosts * Emulate /proc/cpuinfo on s390x * Remove pointless QOM casts * Improve the inclusion logic for libkeyutils and ipmi-bt-test in meson.build # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmR+ycgRHHRodXRoQHJl # ZGhhdC5jb20ACgkQLtnXdP5wLbWXXw//WPz3ng50KLS+M1t3/ULEjO6XkGfP2LQZ # RsZq3hf9THFPZgcREk+6SQvttOSTuvHNfakfujS6U1Ou5thReWqLe4itFW6+hB5j # kQ+Sm6YJ+fpezkBnSefcUoL5nA9VVKZ6KE6kxq5CUBZNoIk1sSsfrU8y8wjzW0yg # 2nraOcG10aLpO2BfvKHVEAhJtwl9pHJsFANmHC2/h2wC9BZIAzdxiytzdcJ909gN # AAa0hIrLK/oFgJjkSSxu+QTaVGPARXqkx5WV546F/zmDMFUWd9nrXaegwqxjgPBN # m9Ua0SXll5hX2Z57vjJWlbTYkD+JUB22L0N7p5/xzhYRpLVSq1pdveo9psrzIC3E # Bt7chZB58acQepJHxxa3UHDOHcnfdfaN+Dd9wD29wHr7nK8lOcsen7/7V+5YXomc # qenkCtkpjKTl07OBxe6MDGZtPZYA8fK1CjEyYwHCe8QvxEzsyg96Bm3j4N2VPxQU # +f/sFPX7SgogZI4mB4wdoxOF1RmQ+DXQ2tnB970txZRkmFq2jJHpW86jkkbq2Jl1 # KIjgdIXjVgy+MPtuQzO5cT+jfhGQL7FQynGXHjv/UidBid5XD3TDVNa9AthN3Mng # +rPT90VJ7j9soMqvmNT1COSIRD+M49dQKBIQuq/gWplaTOHaAcJrCwYScwqe0u0P # zmjCNeuPVw8= # =dfJr # -----END PGP SIGNATURE----- # gpg: Signature made Mon 05 Jun 2023 10:53:12 PM PDT # gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5 # gpg: issuer "thuth@redhat.com" # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [unknown] # gpg: aka "Thomas Huth <thuth@redhat.com>" [unknown] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * tag 'pull-request-2023-06-06' of https://gitlab.com/thuth/qemu: linux-user: Emulate /proc/cpuinfo on s390x linux-user/elfload: Introduce elf_hwcap_str() on s390x linux-user/elfload: Expose get_elf_hwcap() on s390x s390x/tcg: Fix CPU address returned by STIDP bulk: Remove pointless QOM casts scripts: Add qom-cast-macro-clean-cocci-gen.py hw/mips/malta: Fix the malta machine on big endian hosts gitlab-ci: Remove unused Python package tests/qtest: Run ipmi-bt-test only if CONFIG_IPMI_EXTERN is set tests/tcg/s390x: Test MXDB and MXDBR target/s390x: Fix MXDB and MXDBR Add conditional dependency for libkeyutils tests/tcg/s390x: Test single-stepping SVC linux-user/s390x: Fix single-stepping SVC tests/tcg/s390x: Test LOCFHR target/s390x: Fix LOCFHR taking the wrong half of R2 tests/tcg/s390x: Test LCBB target/s390x: Fix LCBB overwriting the top 32 bits Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-05bulk: Remove pointless QOM castsPhilippe Mathieu-Daudé
Mechanical change running Coccinelle spatch with content generated from the qom-cast-macro-clean-cocci-gen.py added in the previous commit. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230601093452.38972-3-philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-06-05qcow2: add discard-no-unref optionJean-Louis Dupond
When we for example have a sparse qcow2 image and discard: unmap is enabled, there can be a lot of fragmentation in the image after some time. Especially on VM's that do a lot of writes/deletes. This causes the qcow2 image to grow even over 110% of its virtual size, because the free gaps in the image get too small to allocate new continuous clusters. So it allocates new space at the end of the image. Disabling discard is not an option, as discard is needed to keep the incremental backup size as low as possible. Without discard, the incremental backups would become large, as qemu thinks it's just dirty blocks but it doesn't know the blocks are unneeded. So we need to avoid fragmentation but also 'empty' the unneeded blocks in the image to have a small incremental backup. In addition, we also want to send the discards further down the stack, so the underlying blocks are still discarded. Therefor we introduce a new qcow2 option "discard-no-unref". When setting this option to true, discards will no longer have the qcow2 driver relinquish cluster allocations. Other than that, the request is handled as normal: All clusters in range are marked as zero, and, if pass-discard-request is true, it is passed further down the stack. The only difference is that the now-zero clusters are preallocated instead of being unallocated. This will avoid fragmentation on the qcow2 image. Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621 Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be> Message-Id: <20230605084523.34134-2-jean-louis@dupond.be> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05parallels: Incorrect condition in out-of-image checkAlexander Ivanov
All the offsets in the BAT must be lower than the file size. Fix the check condition for correct check. Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <20230424093147.197643-13-alexander.ivanov@virtuozzo.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARDAlexander Ivanov
Replace the way we use mutex in parallels_co_check() for simplier and less error prone code. Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <20230424093147.197643-12-alexander.ivanov@virtuozzo.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05parallels: Move statistic collection to a separate functionAlexander Ivanov
We will add more and more checks so we need a better code structure in parallels_co_check. Let each check performs in a separate loop in a separate helper. Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20230424093147.197643-11-alexander.ivanov@virtuozzo.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05parallels: Move check of leaks to a separate functionAlexander Ivanov
We will add more and more checks so we need a better code structure in parallels_co_check. Let each check performs in a separate loop in a separate helper. Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Message-Id: <20230424093147.197643-10-alexander.ivanov@virtuozzo.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05parallels: Fix statistics calculationAlexander Ivanov
Exclude out-of-image clusters from allocated and fragmented clusters calculation. Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Message-Id: <20230424093147.197643-9-alexander.ivanov@virtuozzo.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05parallels: Move check of cluster outside image to a separate functionAlexander Ivanov
We will add more and more checks so we need a better code structure in parallels_co_check. Let each check performs in a separate loop in a separate helper. Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <20230424093147.197643-8-alexander.ivanov@virtuozzo.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05parallels: Move check of unclean image to a separate functionAlexander Ivanov
We will add more and more checks so we need a better code structure in parallels_co_check. Let each check performs in a separate loop in a separate helper. Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20230424093147.197643-7-alexander.ivanov@virtuozzo.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05parallels: Use generic infrastructure for BAT writing in parallels_co_check()Alexander Ivanov
BAT is written in the context of conventional operations over the image inside bdrv_co_flush() when it calls parallels_co_flush_to_os() callback. Thus we should not modify BAT array directly, but call parallels_set_bat_entry() helper and bdrv_co_flush() further on. After that there is no need to manually write BAT and track its modification. This makes code more generic and allows to split parallels_set_bat_entry() for independent pieces. Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <20230424093147.197643-6-alexander.ivanov@virtuozzo.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05parallels: create parallels_set_bat_entry_helper() to assign BAT valueAlexander Ivanov
This helper will be reused in next patches during parallels_co_check rework to simplify its code. Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20230424093147.197643-5-alexander.ivanov@virtuozzo.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05parallels: Fix image_end_offset and data_end after out-of-image checkAlexander Ivanov
Set data_end to the end of the last cluster inside the image. In such a way we can be sure that corrupted offsets in the BAT can't affect on the image size. If there are no allocated clusters set image_end_offset by data_end. Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <20230424093147.197643-4-alexander.ivanov@virtuozzo.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05parallels: Fix high_off calculation in parallels_co_check()Alexander Ivanov
Don't let high_off be more than the file size even if we don't fix the image. Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20230424093147.197643-3-alexander.ivanov@virtuozzo.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05parallels: Out of image offset in BAT leads to image inflationAlexander Ivanov
data_end field in BDRVParallelsState is set to the biggest offset present in BAT. If this offset is outside of the image, any further write will create the cluster at this offset and/or the image will be truncated to this offset on close. This is definitely not correct. Raise an error in parallels_open() if data_end points outside the image and it is not a check (let the check to repaire the image). Set data_end to the end of the cluster with the last correct offset. Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Message-Id: <20230424093147.197643-2-alexander.ivanov@virtuozzo.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-06-05block: Collapse padded I/O vecs exceeding IOV_MAXHanna Czenczek
When processing vectored guest requests that are not aligned to the storage request alignment, we pad them by adding head and/or tail buffers for a read-modify-write cycle. The guest can submit I/O vectors up to IOV_MAX (1024) in length, but with this padding, the vector can exceed that limit. As of 4c002cef0e9abe7135d7916c51abce47f7fc1ee2 ("util/iov: make qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the limit, instead returning an error to the guest. To the guest, this appears as a random I/O error. We should not return an I/O error to the guest when it issued a perfectly valid request. Before 4c002cef0e9abe7135d7916c51abce47f7fc1ee2, we just made the vector longer than IOV_MAX, which generally seems to work (because the guest assumes a smaller alignment than we really have, file-posix's raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and so emulate the request, so that the IOV_MAX does not matter). However, that does not seem exactly great. I see two ways to fix this problem: 1. We split such long requests into two requests. 2. We join some elements of the vector into new buffers to make it shorter. I am wary of (1), because it seems like it may have unintended side effects. (2) on the other hand seems relatively simple to implement, with hopefully few side effects, so this patch does that. To do this, the use of qemu_iovec_init_extended() in bdrv_pad_request() is effectively replaced by the new function bdrv_create_padded_qiov(), which not only wraps the request IOV with padding head/tail, but also ensures that the resulting vector will not have more than IOV_MAX elements. Putting that functionality into qemu_iovec_init_extended() is infeasible because it requires allocating a bounce buffer; doing so would require many more parameters (buffer alignment, how to initialize the buffer, and out parameters like the buffer, its length, and the original elements), which is not reasonable. Conversely, it is not difficult to move qemu_iovec_init_extended()'s functionality into bdrv_create_padded_qiov() by using public qemu_iovec_* functions, so that is what this patch does. Because bdrv_pad_request() was the only "serious" user of qemu_iovec_init_extended(), the next patch will remove the latter function, so the functionality is not implemented twice. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2141964 Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-Id: <20230411173418.19549-3-hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-06-02cutils: Adjust signature of parse_uint[_full]Eric Blake
It's already confusing that we have two very similar functions for wrapping the parse of a 64-bit unsigned value, differing mainly on whether they permit leading '-'. Adjust the signature of parse_uint() and parse_uint_full() to be like all of qemu_strto*(): put the result parameter last, use the same types (uint64_t and unsigned long long have the same width, but are not always the same type), and mark endptr const (this latter change only affects the rare caller of parse_uint). Adjust all callers in the tree. While at it, note that since cutils.c already includes: QEMU_BUILD_BUG_ON(sizeof(int64_t) != sizeof(long long)); we are guaranteed that the result of parse_uint* cannot exceed UINT64_MAX (or the build would have failed), so we can drop pre-existing dead comparisons in opts-visitor.c that were never false. Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Message-Id: <20230522190441.64278-8-eblake@redhat.com> [eblake: Drop dead code spotted by Markus] Signed-off-by: Eric Blake <eblake@redhat.com>