aboutsummaryrefslogtreecommitdiff
path: root/block/file-posix.c
AgeCommit message (Collapse)Author
2023-12-21block/file-posix: set up Linux AIO and io_uring in the current threadStefan Hajnoczi
The file-posix block driver currently only sets up Linux AIO and io_uring in the BDS's AioContext. In the multi-queue block layer we must be able to submit I/O requests in AioContexts that do not have Linux AIO and io_uring set up yet since any thread can call into the block driver. Set up Linux AIO and io_uring for the current AioContext during request submission. We lose the ability to return an error from .bdrv_file_open() when Linux AIO and io_uring setup fails (e.g. due to resource limits). Instead the user only gets warnings and we fall back to aio=threads. This is still better than a fatal error after startup. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230914140101.1065008-2-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-06file-posix: fix over-writing of returning zone_append offsetNaohiro Aota
raw_co_zone_append() sets "s->offset" where "BDRVRawState *s". This pointer is used later at raw_co_prw() to save the block address where the data is written. When multiple IOs are on-going at the same time, a later IO's raw_co_zone_append() call over-writes a former IO's offset address before raw_co_prw() completes. As a result, the former zone append IO returns the initial value (= the start address of the writing zone), instead of the proper address. Fix the issue by passing the offset pointer to raw_co_prw() instead of passing it through s->offset. Also, remove "offset" from BDRVRawState as there is no usage anymore. Fixes: 4751d09adcc3 ("block: introduce zone append write for zoned devices") Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Message-Id: <20231030073853.2601162-1-naohiro.aota@wdc.com> Reviewed-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-11-06block/file-posix: fix update_zones_wp() callerSam Li
When the zoned request fail, it needs to update only the wp of the target zones for not disrupting the in-flight writes on these other zones. The wp is updated successfully after the request completes. Fixed the callers with right offset and nr_zones. Signed-off-by: Sam Li <faithilikerun@gmail.com> Message-Id: <20230825040556.4217-1-faithilikerun@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [hreitz: Rebased and fixed comment spelling] Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2023-09-21Merge tag 'pull-block-2023-09-01' of https://gitlab.com/hreitz/qemu into stagingStefan Hajnoczi
Block patches - Fix for file-posix's zoning code crashing on I/O errors - Throttling refactoring # -----BEGIN PGP SIGNATURE----- # # iQJGBAABCAAwFiEEy2LXoO44KeRfAE00ofpA0JgBnN8FAmTxnMISHGhyZWl0ekBy # ZWRoYXQuY29tAAoJEKH6QNCYAZzfYkUP+gMG9hhzvgjj/tw9rEBQjciihzcQmqQJ # 2Mm37RH2jj5bnnTdaTbMkcRRwVhncYSCwK9q5EYVbZmU9C/v4YJmsSEQlcl7wVou # hbPUv6NHaBrJZX9nxNSa2RHui6pZMLKa/D0rJVB7NjYBrrRtiPo7kiLVQYjYXa2g # kcCCfY4t3Z2RxOP31mMXRjYlhJE9bIuZdTEndrKme8KS2JGPZEJ9xjkoW1tj96EX # oc/Cg2vk7AEtsFYA0bcD8fTFkBDJEwyYl3usu7Tk24pvH16jk7wFSqRVSsDMfnER # tG8X3mHLIY0hbSkpzdHJdXINvZ6FWpQb0CGzIKr+pMiuWVdWr1HglBr0m4pVF+Y4 # A6AI6VX2JJgtacypoDyCZC9mzs1jIdeiwq9v5dyuikJ6ivTwEEoeoSLnLTN3AjXn # 0mtQYzgCg5Gd6+rTo7XjSO9SSlbaVrDl/B2eXle6tmIFT5k+86fh0hc+zTmP8Rkw # Knbc+5Le95wlMrOUNx2GhXrTGwX510hLxKboho/LITxtAzqvXnEJKrYbnkm3WPnw # wfHnR5VQH1NKEpiH/p33og6OV/vu9e7vgp0ZNZV136SnzC90C1zMUwg2simJW701 # 34EtN0XBX8XBKrxfe7KscV9kRE8wrWWJVbhp+WOcQEomGI8uraxzWqDIk/v7NZXv # m4XBscaB+Iri # =oKgk # -----END PGP SIGNATURE----- # gpg: Signature made Fri 01 Sep 2023 04:11:46 EDT # gpg: using RSA key CB62D7A0EE3829E45F004D34A1FA40D098019CDF # gpg: issuer "hreitz@redhat.com" # gpg: Good signature from "Hanna Reitz <hreitz@redhat.com>" [unknown] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: CB62 D7A0 EE38 29E4 5F00 4D34 A1FA 40D0 9801 9CDF * tag 'pull-block-2023-09-01' of https://gitlab.com/hreitz/qemu: tests/file-io-error: New test file-posix: Simplify raw_co_prw's 'out' zone code file-posix: Fix zone update in I/O error path file-posix: Check bs->bl.zoned for zone info file-posix: Clear bs->bl.zoned on error block/throttle-groups: Use ThrottleDirection instread of bool is_write fsdev: Use ThrottleDirection instread of bool is_write throttle: use THROTTLE_MAX/ARRAY_SIZE for hard code throttle: use enum ThrottleDirection instead of bool is_write cryptodev: use NULL throttle timer cb for read direction test-throttle: test read only and write only throttle: support read-only and write-only test-throttle: use enum ThrottleDirection throttle: introduce enum ThrottleDirection Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-09-08block: spelling fixesMichael Tokarev
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Eric Blake <eblake@redhat.com>
2023-08-29file-posix: Simplify raw_co_prw's 'out' zone codeHanna Czenczek
We duplicate the same condition three times here, pull it out to the top level. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-Id: <20230824155345.109765-5-hreitz@redhat.com> Reviewed-by: Sam Li <faithilikerun@gmail.com>
2023-08-29file-posix: Fix zone update in I/O error pathHanna Czenczek
We must check that zone information is present before running update_zones_wp(). Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2234374 Fixes: Coverity CID 1512459 Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-Id: <20230824155345.109765-4-hreitz@redhat.com> Reviewed-by: Sam Li <faithilikerun@gmail.com>
2023-08-29file-posix: Check bs->bl.zoned for zone infoHanna Czenczek
Instead of checking bs->wps or bs->bl.zone_size for whether zone information is present, check bs->bl.zoned. That is the flag that raw_refresh_zoned_limits() reliably sets to indicate zone support. If it is set to something other than BLK_Z_NONE, other values and objects like bs->wps and bs->bl.zone_size must be non-null/zero and valid; if it is not, we cannot rely on their validity. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-Id: <20230824155345.109765-3-hreitz@redhat.com> Reviewed-by: Sam Li <faithilikerun@gmail.com>
2023-08-29file-posix: Clear bs->bl.zoned on errorHanna Czenczek
bs->bl.zoned is what indicates whether the zone information is present and valid; it is the only thing that raw_refresh_zoned_limits() sets if CONFIG_BLKZONED is not defined, and it is also the only thing that it sets if CONFIG_BLKZONED is defined, but there are no zones. Make sure that it is always set to BLK_Z_NONE if there is an error anywhere in raw_refresh_zoned_limits() so that we do not accidentally announce zones while our information is incomplete or invalid. This also fixes a memory leak in the last error path in raw_refresh_zoned_limits(). Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-Id: <20230824155345.109765-2-hreitz@redhat.com> Reviewed-by: Sam Li <faithilikerun@gmail.com>
2023-07-27block/file-posix: fix g_file_get_contents return pathSam Li
The g_file_get_contents() function returns a g_boolean. If it fails, the returned value will be 0 instead of -1. Solve the issue by skipping assigning ret value. This issue was found by Matthew Rosato using virtio-blk-{pci,ccw} backed by an NVMe partition e.g. /dev/nvme0n1p1 on s390x. Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-id: 20230727115844.8480-1-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-28file-posix: remove incorrect coroutine_fn callsPaolo Bonzini
raw_co_getlength is called by handle_aiocb_write_zeroes, which is not a coroutine function. This is harmless because raw_co_getlength does not actually suspend, but in the interest of clarity make it a non-coroutine_fn that is just wrapped by the coroutine_fn raw_co_getlength. Likewise, check_cache_dropped was only a coroutine_fn because it called raw_co_getlength, so it can be made non-coroutine as well. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230601115145.196465-2-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-01block/linux-aio: convert to blk_io_plug_call() APIStefan Hajnoczi
Stop using the .bdrv_co_io_plug() API because it is not multi-queue block layer friendly. Use the new blk_io_plug_call() API to batch I/O submission instead. Note that a dev_max_batch check is dropped in laio_io_unplug() because the semantics of unplug_fn() are different from .bdrv_co_unplug(): 1. unplug_fn() is only called when the last blk_io_unplug() call occurs, not every time blk_io_unplug() is called. 2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no way to get per-BlockDriverState fields like dev_max_batch. Therefore this condition cannot be moved to laio_unplug_fn(). It is not obvious that this condition affects performance in practice, so I am removing it instead of trying to come up with a more complex mechanism to preserve the condition. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20230530180959.1108766-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01block/io_uring: convert to blk_io_plug_call() APIStefan Hajnoczi
Stop using the .bdrv_co_io_plug() API because it is not multi-queue block layer friendly. Use the new blk_io_plug_call() API to batch I/O submission instead. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20230530180959.1108766-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block: add some trace events for zone appendSam Li
Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508051510.177850-5-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block: introduce zone append write for zoned devicesSam Li
A zone append command is a write operation that specifies the first logical block of a zone as the write position. When writing to a zoned block device using zone append, the byte offset of the call may point at any position within the zone to which the data is being appended. Upon completion the device will respond with the position where the data has been written in the zone. Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508051510.177850-3-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15file-posix: add tracking of the zone write pointersSam Li
Since Linux doesn't have a user API to issue zone append operations to zoned devices from user space, the file-posix driver is modified to add zone append emulation using regular writes. To do this, the file-posix driver tracks the wp location of all zones of the device. It uses an array of uint64_t. The most significant bit of each wp location indicates if the zone type is conventional zones. The zones wp can be changed due to the following operations issued: - zone reset: change the wp to the start offset of that zone - zone finish: change to the end location of that zone - write to a zone - zone append Signed-off-by: Sam Li <faithilikerun@gmail.com> Message-id: 20230508051510.177850-2-faithilikerun@gmail.com [Fix errno propagation from handle_aiocb_zone_mgmt() --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block: add some trace events for new block layer APIsSam Li
Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508045533.175575-8-faithilikerun@gmail.com Message-id: 20230324090605.28361-8-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block: add zoned BlockDriver check to block layerSam Li
Putting zoned/non-zoned BlockDrivers on top of each other is not allowed. Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508045533.175575-6-faithilikerun@gmail.com Message-id: 20230324090605.28361-6-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé <philmd@linaro.org> and clarify that the check is about zoned BlockDrivers. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block/block-backend: add block layer APIs resembling Linux ZonedBlockDevice ↵Sam Li
ioctls Add zoned device option to host_device BlockDriver. It will be presented only for zoned host block devices. By adding zone management operations to the host_block_device BlockDriver, users can use the new block layer APIs including Report Zone and four zone management operations (open, close, finish, reset, reset_all). Qemu-io uses the new APIs to perform zoned storage commands of the device: zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs), zone_finish(zf). For example, to test zone_report, use following command: $ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0 -c "zrp offset nr_zones" Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508045533.175575-4-faithilikerun@gmail.com Message-id: 20230324090605.28361-4-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé <philmd@linaro.org> and remove spurious ret = -errno in raw_co_zone_mgmt(). --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block/file-posix: introduce helper functions for sysfs attributesSam Li
Use get_sysfs_str_val() to get the string value of device zoned model. Then get_sysfs_zoned_model() can convert it to BlockZoneModel type of QEMU. Use get_sysfs_long_val() to get the long value of zoned device information. Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508045533.175575-3-faithilikerun@gmail.com Message-id: 20230324090605.28361-3-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé <philmd@linaro.org>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-04-25thread-pool: avoid passing the pool parameter every timeEmanuele Giuseppe Esposito
thread_pool_submit_aio() is always called on a pool taken from qemu_get_current_aio_context(), and that is the only intended use: each pool runs only in the same thread that is submitting work to it, it can't run anywhere else. Therefore simplify the thread_pool_submit* API and remove the ThreadPool function parameter. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230203131731.851116-5-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-04-25thread-pool: use ThreadPool from the running threadEmanuele Giuseppe Esposito
Use qemu_get_current_aio_context() where possible, since we always submit work to the current thread anyways. We want to also be sure that the thread submitting the work is the same as the one processing the pool, to avoid adding synchronization to the pool list. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230203131731.851116-4-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-04-25io_uring: use LuringState from the running threadEmanuele Giuseppe Esposito
Remove usage of aio_context_acquire by always submitting asynchronous AIO to the current thread's LuringState. In order to prevent mistakes from the caller side, avoid passing LuringState in luring_io_{plug/unplug} and luring_co_submit, and document the functions to make clear that they work in the current thread's AioContext. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230203131731.851116-3-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-04-25linux-aio: use LinuxAioState from the running threadEmanuele Giuseppe Esposito
Remove usage of aio_context_acquire by always submitting asynchronous AIO to the current thread's LinuxAioState. In order to prevent mistakes from the caller side, avoid passing LinuxAioState in laio_io_{plug/unplug} and laio_co_submit, and document the functions to make clear that they work in the current thread's AioContext. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230203131731.851116-2-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-04-11block: remove has_variable_length from BlockDriverPaolo Bonzini
Fill in the field in BlockLimits directly for host devices, and copy it from there for the raw format. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20230407153303.391121-5-pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-23block: Mark bdrv_co_create() and callers GRAPH_RDLOCKKevin Wolf
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_create() need to hold a reader lock for the graph. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-17-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-23block: Mark bdrv_co_copy_range() GRAPH_RDLOCKEmanuele Giuseppe Esposito
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_copy_range() need to hold a reader lock for the graph. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-15-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-23block: Mark bdrv_co_flush() and callers GRAPH_RDLOCKEmanuele Giuseppe Esposito
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_flush() need to hold a reader lock for the graph. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230203152202.49054-8-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-17block/file-posix: don't use functions calling AIO_WAIT_WHILE in worker threadsEmanuele Giuseppe Esposito
When calling bdrv_getlength() in handle_aiocb_write_zeroes(), the function creates a new coroutine and then waits that it finishes using AIO_WAIT_WHILE. The problem is that this function could also run in a worker thread, that has a different AioContext from main loop and iothreads, therefore in AIO_WAIT_WHILE we will have in_aio_context_home_thread(ctx) == false and therefore assert(qemu_get_current_aio_context() == qemu_get_aio_context()); in the else branch will fail, crashing QEMU. Aside from that, bdrv_getlength() is wrong also conceptually, because it reads the BDS graph from another thread and is not protected by any lock. Replace it with raw_co_getlength, that doesn't create a coroutine and doesn't read the BDS graph. Reported-by: Ninad Palsule <ninad@linux.vnet.ibm.com> Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230209154522.1164401-1-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-01block/file: Add file-specific image infoHanna Reitz
Add some (optional) information that the file driver can provide for image files, namely the extent size hint. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220620162704.80987-3-hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-01block: Convert bdrv_lock_medium() to co_wrapperEmanuele Giuseppe Esposito
bdrv_lock_medium() is categorized as an I/O function, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since it traverses the block nodes graph, which however is only possible in a coroutine. The only caller of this function is blk_lock_medium(). Therefore make blk_lock_medium() a co_wrapper, so that it always creates a new coroutine, and then make bdrv_lock_medium() a coroutine_fn where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-13-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-01block: Convert bdrv_eject() to co_wrapperEmanuele Giuseppe Esposito
bdrv_eject() is categorized as an I/O function, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since it traverses the block nodes graph, which however is only possible in a coroutine. The only caller of this function is blk_eject(). Therefore make blk_eject() a co_wrapper, so that it always creates a new coroutine, and then make bdrv_eject() coroutine_fn where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-12-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-01block: Convert bdrv_get_info() to co_wrapper_mixedEmanuele Giuseppe Esposito
bdrv_get_info() is categorized as an I/O function, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since it traverses the block nodes graph, which however is only possible in a coroutine. Therefore turn it into a co_wrapper to move the actual function into a coroutine where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-11-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-01block: Convert bdrv_get_allocated_file_size() to co_wrapperEmanuele Giuseppe Esposito
bdrv_get_allocated_file_size() is categorized as an I/O function, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since it traverses the block nodes graph, which however is only possible in a coroutine. Therefore turn it into a co_wrapper to move the actual function into a coroutine where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-10-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-01block: Convert bdrv_refresh_total_sectors() to co_wrapper_mixedEmanuele Giuseppe Esposito
BlockDriver->bdrv_getlength is categorized as IO callback, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since the callback traverses the block nodes graph, which however is only possible in a coroutine. Therefore turn it into a co_wrapper to move the actual function into a coroutine where the lock can be taken. Because now this function creates a new coroutine and polls, we need to take the AioContext lock where it is missing, for the only reason that internally co_wrapper calls AIO_WAIT_WHILE and it expects to release the AioContext lock. This is especially messy when a co_wrapper creates a coroutine and polls in bdrv_open_driver, because this function has so many callers in so many context that it can easily lead to deadlocks. Therefore the new rule for bdrv_open_driver is that the caller must always hold the AioContext lock of the given bs (except if it is a coroutine), because the function calls bdrv_refresh_total_sectors() which is now a co_wrapper. Once the rwlock is ultimated and placed in every place it needs to be, we will poll using AIO_WAIT_WHILE_UNLOCKED and remove the AioContext lock. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-7-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-01block: Convert bdrv_is_inserted() to co_wrapperEmanuele Giuseppe Esposito
bdrv_is_inserted() is categorized as an I/O function, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since it traverses the block nodes graph, which however is only possible in a coroutine. Therefore turn it into a co_wrapper to move the actual function into a coroutine where the lock can be taken. At the same time, add also blk_is_inserted as co_wrapper_mixed, since it is called in both coroutine and non-coroutine contexts. Because now this function creates a new coroutine and polls, we need to take the AioContext lock where it is missing, for the only reason that internally c_w_mixed_bdrv_rdlock calls AIO_WAIT_WHILE and it expects to release the AioContext lock. Once the rwlock is ultimated and placed in every place it needs to be, we will poll using AIO_WAIT_WHILE_UNLOCKED and remove the AioContext lock. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-5-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-01block: Convert bdrv_io_unplug() to co_wrapperEmanuele Giuseppe Esposito
BlockDriver->bdrv_io_unplug is categorized as IO callback, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since the callback traverses the block nodes graph, which however is only possible in a coroutine. The only caller of this function is blk_io_unplug(), therefore make blk_io_unplug() a co_wrapper, so that we're always running in a coroutine where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-4-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-01block: Convert bdrv_io_plug() to co_wrapperEmanuele Giuseppe Esposito
BlockDriver->bdrv_io_plug is categorized as IO callback, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since the callback traverses the block nodes graph, which however is only possible in a coroutine. The only caller of this function is blk_io_plug(), therefore make blk_io_plug() a co_wrapper, so that we're always running in a coroutine where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-3-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-01-20include/block: Untangle inclusion loopsMarkus Armbruster
We have two inclusion loops: block/block.h -> block/block-global-state.h -> block/block-common.h -> block/blockjob.h -> block/block.h block/block.h -> block/block-io.h -> block/block-common.h -> block/blockjob.h -> block/block.h I believe these go back to Emanuele's reorganization of the block API, merged a few months ago in commit d7e2fe4aac8. Fortunately, breaking them is merely a matter of deleting unnecessary includes from headers, and adding them back in places where they are now missing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221221133551.3967339-2-armbru@redhat.com>
2023-01-09error handling: Use RETRY_ON_EINTR() macro where applicableNikita Ivanov
There is a defined RETRY_ON_EINTR() macro in qemu/osdep.h which handles the same while loop. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/415 Signed-off-by: Nikita Ivanov <nivanov@cloudlinux.com> Message-Id: <20221023090422.242617-3-nivanov@cloudlinux.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [thuth: Dropped the hunk that changed socket_accept() in libqtest.c] Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-10-26block: add BDRV_REQ_REGISTERED_BUF request flagStefan Hajnoczi
Block drivers may optimize I/O requests accessing buffers previously registered with bdrv_register_buf(). Checking whether all elements of a request's QEMUIOVector are within previously registered buffers is expensive, so we need a hint from the user to avoid costly checks. Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all QEMUIOVector elements in an I/O request are known to be within previously registered buffers. Always pass the flag through to driver read/write functions. There is little harm in passing the flag to a driver that does not use it. Passing the flag to drivers avoids changes across many block drivers. Filter drivers would need to explicitly support the flag and pass through to their children when the children support it. That's a lot of code changes and it's hard to remember to do that everywhere, leading to silent reduced performance when the flag is accidentally dropped. The only problematic scenario with the approach in this patch is when a driver passes the flag through to internal I/O requests that don't use the same I/O buffer. In that case the hint may be set when it should actually be clear. This is a rare case though so the risk is low. Some drivers have assert(!flags), which no longer works when BDRV_REQ_REGISTERED_BUF is passed in. These assertions aren't very useful anyway since the functions are called almost exclusively by bdrv_driver_preadv/pwritev() so if we get flags handling right there then the assertion is not needed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-07file-posix: Remove unused s->discard_zeroesKevin Wolf
The field is unused (only ever set, but never read) since commit ac9185603. Additionally, the commit message of commit 34fa110e already explained earlier why it's unreliable. Remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20220923142838.91043-1-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-07file-posix: add missing coroutine_fn annotationsPaolo Bonzini
Callers of coroutine_fn must be coroutine_fn themselves, or the call must be within "if (qemu_in_coroutine())". Apply coroutine_fn to functions where this holds. Reviewed-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220922084924.201610-9-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-09-30block: use the request length for iov alignmentKeith Busch
An iov length needs to be aligned to the logical block size, which may be larger than the memory alignment. Tested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org> Message-Id: <20220929200523.3218710-3-kbusch@meta.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-09-30block: move bdrv_qiov_is_aligned to file-posixKeith Busch
There is only user of bdrv_qiov_is_aligned(), so move the alignment function to there and make it static. Signed-off-by: Keith Busch <kbusch@kernel.org> Message-Id: <20220929200523.3218710-2-kbusch@meta.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-09-30block: use bdrv_is_sg() helper instead of raw bs->sg readingDenis V. Lunev
I believe that if the helper exists, it must be used always for reading of the value. It breaks expectations in the other case. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Kevin Wolf <kwolf@redhat.com> CC: Hanna Reitz <hreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <fam@euphon.net> CC: Ronnie Sahlberg <ronniesahlberg@gmail.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20220817083736.40981-2-den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-05-03block: move fcntl_setfl()Marc-André Lureau
It is only used by block/file-posix.c, move it there. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2022-04-06Remove qemu-common.h include from most unitsMarc-André Lureau
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06Replace qemu_real_host_page variables with inlined functionsMarc-André Lureau
Replace the global variables with inlined helper functions. getpagesize() is very likely annotated with a "const" function attribute (at least with glibc), and thus optimization should apply even better. This avoids the need for a constructor initialization too. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15block/file-posix: Remove a deprecation warning on macOS 12Philippe Mathieu-Daudé
When building on macOS 12 we get: block/file-posix.c:3335:18: warning: 'IOMasterPort' is deprecated: first deprecated in macOS 12.0 [-Wdeprecated-declarations] kernResult = IOMasterPort( MACH_PORT_NULL, &masterPort ); ^~~~~~~~~~~~ IOMainPort Replace by IOMainPort, redefining it to IOMasterPort if not available. Suggested-by: Akihiko Odaki <akihiko.odaki@gmail.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed by: Cameron Esfahani <dirty@apple.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com> Tested-by: Akihiko Odaki <akihiko.odaki@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>