aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-09-16virtiofsd: Reverse req_list before processing itSergio Lopez
With the thread pool disabled, we add the requests in the queue to a GList, processing by iterating over there afterwards. For adding them, we're using "g_list_prepend()", which is more efficient but causes the requests to be processed in reverse order, breaking the read-ahead and request-merging optimizations in the host for sequential operations. According to the documentation, if you need to process the request in-order, using "g_list_prepend()" and then reversing the list with "g_list_reverse()" is more efficient than using "g_list_append()", so let's do it that way. Testing on a spinning disk (to boost the increase of read-ahead and request-merging) shows a 4x improvement on sequential write fio test: Test: fio --directory=/mnt/virtio-fs --filename=fio-file1 --runtime=20 --iodepth=16 --size=4G --direct=1 --blocksize=4K --ioengine libaio --rw write --name seqwrite-libaio Without "g_list_reverse()": ... Jobs: 1 (f=1): [W(1)][100.0%][w=22.4MiB/s][w=5735 IOPS][eta 00m:00s] seqwrite-libaio: (groupid=0, jobs=1): err= 0: pid=710: Tue Aug 24 12:58:16 2021 write: IOPS=5709, BW=22.3MiB/s (23.4MB/s)(446MiB/20002msec); 0 zone resets ... With "g_list_reverse()": ... Jobs: 1 (f=1): [W(1)][100.0%][w=84.0MiB/s][w=21.5k IOPS][eta 00m:00s] seqwrite-libaio: (groupid=0, jobs=1): err= 0: pid=716: Tue Aug 24 13:00:15 2021 write: IOPS=21.3k, BW=83.1MiB/s (87.2MB/s)(1663MiB/20001msec); 0 zone resets ... Signed-off-by: Sergio Lopez <slp@redhat.com> Message-Id: <20210824131158.39970-1-slp@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-09-16tools/virtiofsd: Add fstatfs64 syscall to the seccomp allowlistThomas Huth
The virtiofsd currently crashes on s390x when doing something like this in the guest: mkdir -p /mnt/myfs mount -t virtiofs myfs /mnt/myfs touch /mnt/myfs/foo.txt stat -f /mnt/myfs/foo.txt The problem is that the fstatfs64 syscall is called in this case from the virtiofsd. We have to put it on the seccomp allowlist to avoid that the daemon gets killed in this case. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001728 Suggested-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210914123214.181885-1-thuth@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-09-15Merge remote-tracking branch 'remotes/hreitz/tags/pull-block-2021-09-15' ↵Peter Maydell
into staging Block patches: - Block-status cache for data regions - qcow2 optimization (when using subclusters) - iotests delinting, and let 297 (lint checker) cover named iotests - qcow2 check improvements - Added -F (target backing file format) option to qemu-img convert - Mirror job fix - Fix for when a migration is initiated while a backup job runs - Fix for uncached qemu-img convert to a volume with 4k sectors (for an unaligned image) - Minor gluster driver fix # gpg: Signature made Wed 15 Sep 2021 18:39:11 BST # gpg: using RSA key CB62D7A0EE3829E45F004D34A1FA40D098019CDF # gpg: issuer "hreitz@redhat.com" # gpg: Good signature from "Hanna Reitz <hreitz@redhat.com>" [marginal] # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: CB62 D7A0 EE38 29E4 5F00 4D34 A1FA 40D0 9801 9CDF * remotes/hreitz/tags/pull-block-2021-09-15: (32 commits) qemu-img: Add -F shorthand to convert qcow2-refcount: check_refblocks(): add separate message for reserved qcow2-refcount: check_refcounts_l1(): check reserved bits qcow2-refcount: improve style of check_refcounts_l1() qcow2-refcount: check_refcounts_l2(): check reserved bits qcow2-refcount: check_refcounts_l2(): check l2_bitmap qcow2-refcount: fix_l2_entry_by_zero(): also zero L2 entry bitmap qcow2-refcount: introduce fix_l2_entry_by_zero() qcow2: introduce qcow2_parse_compressed_l2_entry() helper qcow2: compressed read: simplify cluster descriptor passing qcow2-refcount: improve style of check_refcounts_l2() qemu-img: Allow target be aligned to sector size qcow2: handle_dependencies(): relax conflict detection qcow2: refactor handle_dependencies() loop body simplebench: add img_bench_templater.py block: bdrv_inactivate_recurse(): check for permissions and fix crash tests: add migrate-during-backup block/mirror: fix NULL pointer dereference in mirror_wait_on_conflicts() iotests/297: Cover tests/ mirror-top-perms: Fix AbnormalShutdown path ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-15qemu-img: Add -F shorthand to convertEric Blake
Although we have long supported 'qemu-img convert -o backing_file=foo,backing_fmt=bar', the fact that we have a shortcut -B for backing_file but none for backing_fmt has made it more likely that users accidentally run into: qemu-img: warning: Deprecated use of backing file without explicit backing format when using -B instead of -o. For similarity with other qemu-img commands, such as create and compare, add '-F $fmt' as the shorthand for '-o backing_fmt=$fmt'. Update iotest 122 for coverage of both spellings. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210913131735.1948339-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15qcow2-refcount: check_refblocks(): add separate message for reservedVladimir Sementsov-Ogievskiy
Split checking for reserved bits out of aligned offset check. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210914122454.141075-11-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15qcow2-refcount: check_refcounts_l1(): check reserved bitsVladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210914122454.141075-10-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15qcow2-refcount: improve style of check_refcounts_l1()Vladimir Sementsov-Ogievskiy
- use g_autofree for l1_table - better name for size in bytes variable - reduce code blocks nesting - whitespaces, braces, newlines Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210914122454.141075-9-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15qcow2-refcount: check_refcounts_l2(): check reserved bitsVladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210914122454.141075-8-vsementsov@virtuozzo.com> [hreitz: Separated `type` declaration from statements] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15qcow2-refcount: check_refcounts_l2(): check l2_bitmapVladimir Sementsov-Ogievskiy
Check subcluster bitmap of the l2 entry for different types of clusters: - for compressed it must be zero - for allocated check consistency of two parts of the bitmap - for unallocated all subclusters should be unallocated (or zero-plain) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Message-Id: <20210914122454.141075-7-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15qcow2-refcount: fix_l2_entry_by_zero(): also zero L2 entry bitmapVladimir Sementsov-Ogievskiy
We'll reuse the function to fix wrong L2 entry bitmap. Support it now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210914122454.141075-6-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15qcow2-refcount: introduce fix_l2_entry_by_zero()Vladimir Sementsov-Ogievskiy
Split fix_l2_entry_by_zero() out of check_refcounts_l2() to be reused in further patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210914122454.141075-5-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15qcow2: introduce qcow2_parse_compressed_l2_entry() helperVladimir Sementsov-Ogievskiy
Add helper to parse compressed l2_entry and use it everywhere instead of open-coding. Note, that in most places we move to precise coffset/csize instead of sector-aligned. Still it should work good enough for updating refcounts. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210914122454.141075-4-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15qcow2: compressed read: simplify cluster descriptor passingVladimir Sementsov-Ogievskiy
Let's pass the whole L2 entry and not bother with L2E_COMPRESSED_OFFSET_SIZE_MASK. It also helps further refactoring that adds generic qcow2_parse_compressed_l2_entry() helper. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210914122454.141075-3-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15qcow2-refcount: improve style of check_refcounts_l2()Vladimir Sementsov-Ogievskiy
- don't use same name for size in bytes and in entries - use g_autofree for l2_table - add whitespace - fix block comment style Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210914122454.141075-2-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15gitlab-ci: Mark manual-only jobs as allow_failurePeter Maydell
If a gitlab CI job is marked as manual-only but is not marked as allow_failure, then gitlab considers that the pipeline is "blocked" until the job has been manually triggered. We need to mark these manual-only jobs as also allow_failure: true so that gitlab doesn't insist that they have run before it will consider the pipeline to be complete. Fixes: 4c9af1ea1457782cf0adb29 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-id: 20210915123412.8232-1-peter.maydell@linaro.org Acked-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Willian Rampazzo <willianr@redhat.com>
2021-09-15qemu-img: Allow target be aligned to sector sizeHanna Reitz
We cannot write to images opened with O_DIRECT unless we allow them to be resized so they are aligned to the sector size: Since 9c60a5d1978, bdrv_node_refresh_perm() ensures that for nodes whose length is not aligned to the request alignment and where someone has taken a WRITE permission, the RESIZE permission is taken, too). Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes blk_new_open() to take the RESIZE permission) when using cache=none for the target, so that when writing to it, it can be aligned to the target sector size. Without this patch, an error is returned: $ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210819101200.64235-1-hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-09-15qcow2: handle_dependencies(): relax conflict detectionVladimir Sementsov-Ogievskiy
There is no conflict and no dependency if we have parallel writes to different subclusters of one cluster when the cluster itself is already allocated. So, relax extra dependency. Measure performance: First, prepare build/qemu-img-old and build/qemu-img-new images. cd scripts/simplebench ./img_bench_templater.py Paste the following to stdin of running script: qemu_img=../../build/qemu-img-{old|new} $qemu_img create -f qcow2 -o extended_l2=on /ssd/x.qcow2 1G $qemu_img bench -c 100000 -d 8 [-s 2K|-s 2K -o 512|-s $((1024*2+512))] \ -w -t none -n /ssd/x.qcow2 The result: All results are in seconds ------------------ --------- --------- old new -s 2K 6.7 ± 15% 6.2 ± 12% -7% -s 2K -o 512 13 ± 3% 11 ± 5% -16% -s $((1024*2+512)) 9.5 ± 4% 8.4 -12% ------------------ --------- --------- So small writes are more independent now and that helps to keep deeper io queue which improves performance. 271 iotest output becomes racy for three allocation in one cluster. Second and third writes may finish in different order. Second and third requests don't depend on each other any more. Still they both depend on first request anyway. Filter out second and third write offsets to cover both possible outputs. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210824101517.59802-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> [hreitz: s/ an / and /] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15qcow2: refactor handle_dependencies() loop bodyVladimir Sementsov-Ogievskiy
No logic change, just prepare for the following commit. While being here do also small grammar fix in a comment. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824101517.59802-3-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15simplebench: add img_bench_templater.pyVladimir Sementsov-Ogievskiy
Add simple grammar-parsing template benchmark. New tool consume test template written in bash with some special grammar injections and produces multiple tests, run them and finally print a performance comparison table of different tests produced from one template. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210824101517.59802-2-vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15block: bdrv_inactivate_recurse(): check for permissions and fix crashVladimir Sementsov-Ogievskiy
We must not inactivate child when parent has write permissions on it. Calling .bdrv_inactivate() doesn't help: actually only qcow2 has this handler and it is used to flush caches, not for permission manipulations. So, let's simply check cumulative parent permissions before inactivating the node. This commit fixes a crash when we do migration during backup: prior to the commit nothing prevents all nodes inactivation at migration finish and following backup write to the target crashes on assertion "assert(!(bs->open_flags & BDRV_O_INACTIVE));" in bdrv_co_write_req_prepare(). After the commit, we rely on the fact that copy-before-write filter keeps write permission on target node to be able to write to it. So inactivation fails and migration fails as expected. Corresponding test now passes, so, enable it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210911120027.8063-3-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15tests: add migrate-during-backupVladimir Sementsov-Ogievskiy
Add a simple test which tries to run migration during backup. bdrv_inactivate_all() should fail. But due to bug (see next commit with fix) it doesn't, nodes are inactivated and continued backup crashes on assertion "assert(!(bs->open_flags & BDRV_O_INACTIVE));" in bdrv_co_write_req_prepare(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210911120027.8063-2-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15block/mirror: fix NULL pointer dereference in mirror_wait_on_conflicts()Stefano Garzarella
In mirror_iteration() we call mirror_wait_on_conflicts() with `self` parameter set to NULL. Starting from commit d44dae1a7c we dereference `self` pointer in mirror_wait_on_conflicts() without checks if it is not NULL. Backtrace: Program terminated with signal SIGSEGV, Segmentation fault. #0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>) at ../block/mirror.c:172 172 self->waiting_for_op = op; [Current thread is 1 (Thread 0x7f0908931ec0 (LWP 380249))] (gdb) bt #0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>) at ../block/mirror.c:172 #1 0x00005610c5d9d631 in mirror_run (job=0x5610c76a2c00, errp=<optimized out>) at ../block/mirror.c:491 #2 0x00005610c5d58726 in job_co_entry (opaque=0x5610c76a2c00) at ../job.c:917 #3 0x00005610c5f046c6 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:173 #4 0x00007f0909975820 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /usr/lib64/libc.so.6 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001404 Fixes: d44dae1a7c ("block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210910124533.288318-1-sgarzare@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15iotests/297: Cover tests/Hanna Reitz
297 so far does not check the named tests, which reside in the tests/ directory (i.e. full path tests/qemu-iotests/tests). Fix it. Thanks to the previous two commits, all named tests pass its scrutiny, so we do not have to add anything to SKIP_FILES. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Willian Rampazzo <willianr@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210902094017.32902-6-hreitz@redhat.com>
2021-09-15mirror-top-perms: Fix AbnormalShutdown pathHanna Reitz
The AbnormalShutdown exception class is not in qemu.machine, but in qemu.machine.machine. (qemu.machine.AbnormalShutdown was enough for Python to find it in order to run this test, but pylint complains about it.) Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210902094017.32902-5-hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-09-15migrate-bitmaps-test: Fix pylint warningsHanna Reitz
There are a couple of things pylint takes issue with: - The "time" import is unused - The import order (iotests should come last) - get_bitmap_hash() doesn't use @self and so should be a function - Semicolons at the end of some lines - Parentheses after "if" - Some lines are too long (80 characters instead of 79) - inject_test_case()'s @name parameter shadows a top-level @name variable - "lambda self: mc(self)" were equivalent to just "mc", but in inject_test_case(), it is not equivalent, so add a comment and disable the warning locally - Always put two empty lines after a function - f'exec: cat > /dev/null' does not need to be an f-string Fix them. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210902094017.32902-4-hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-09-15migrate-bitmaps-postcopy-test: Fix pylint warningsHanna Reitz
pylint complains that discards1_sha256 and all_discards_sha256 are first set in non-__init__ methods. These variables are not really class-variables anyway, so let them instead be returned by start_postcopy(), thus silencing pylint. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210902094017.32902-3-hreitz@redhat.com>
2021-09-15iotests/297: Drop 169 and 199 from the skip listHanna Reitz
169 and 199 have been renamed and moved to tests/ (commit a44be0334be: "iotests: rename and move 169 and 199 tests"), so we can drop them from the skip list. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Willian Rampazzo <willianr@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210902094017.32902-2-hreitz@redhat.com>
2021-09-15iotests: Fix use-{list,dict}-literal warningsHanna Reitz
pylint proposes using `[]` instead of `list()` and `{}` instead of `dict()`, because it is faster. That seems simple enough, so heed its advice. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824153540.177128-3-hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-09-15iotests: Fix unspecified-encoding pylint warningsHanna Reitz
As of recently, pylint complains when `open()` calls are missing an `encoding=` specified. Everything we have should be UTF-8 (and in fact, everything should be UTF-8, period (exceptions apply)), so use that. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824153540.177128-2-hreitz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
2021-09-15block/iscsi: Do not force-cap *pnumHanna Reitz
bdrv_co_block_status() does it for us, we do not need to do it here. The advantage of not capping *pnum is that bdrv_co_block_status() can cache larger data regions than requested by its caller. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210812084148.14458-7-hreitz@redhat.com>
2021-09-15block/gluster: Do not force-cap *pnumHanna Reitz
bdrv_co_block_status() does it for us, we do not need to do it here. The advantage of not capping *pnum is that bdrv_co_block_status() can cache larger data regions than requested by its caller. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210812084148.14458-6-hreitz@redhat.com>
2021-09-15block/file-posix: Do not force-cap *pnumHanna Reitz
bdrv_co_block_status() does it for us, we do not need to do it here. The advantage of not capping *pnum is that bdrv_co_block_status() can cache larger data regions than requested by its caller. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210812084148.14458-5-hreitz@redhat.com>
2021-09-15block: Clarify that @bytes is no limit on *pnumHanna Reitz
.bdrv_co_block_status() implementations are free to return a *pnum that exceeds @bytes, because bdrv_co_block_status() in block/io.c will clamp *pnum as necessary. On the other hand, if drivers' implementations return values for *pnum that are as large as possible, our recently introduced block-status cache will become more effective. So, make a note in block_int.h that @bytes is no upper limit for *pnum. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210812084148.14458-4-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2021-09-15block: block-status cache for data regionsHanna Reitz
As we have attempted before (https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html, "file-posix: Cache lseek result for data regions"; https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html, "file-posix: Cache next hole"), this patch seeks to reduce the number of SEEK_DATA/HOLE operations the file-posix driver has to perform. The main difference is that this time it is implemented as part of the general block layer code. The problem we face is that on some filesystems or in some circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the implementation is outside of qemu, there is little we can do about its performance. We have already introduced the want_zero parameter to bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls unless we really want zero information; but sometimes we do want that information, because for files that consist largely of zero areas, special-casing those areas can give large performance boosts. So the real problem is with files that consist largely of data, so that inquiring the block status does not gain us much performance, but where such an inquiry itself takes a lot of time. To address this, we want to cache data regions. Most of the time, when bad performance is reported, it is in places where the image is iterated over from start to end (qemu-img convert or the mirror job), so a simple yet effective solution is to cache only the current data region. (Note that only caching data regions but not zero regions means that returning false information from the cache is not catastrophic: Treating zeroes as data is fine. While we try to invalidate the cache on zero writes and discards, such incongruences may still occur when there are other processes writing to the image.) We only use the cache for nodes without children (i.e. protocol nodes), because that is where the problem is: Drivers that rely on block-status implementations outside of qemu (e.g. SEEK_DATA/HOLE). Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210812084148.14458-3-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [hreitz: Added `local_file == bs` assertion, as suggested by Vladimir] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15block: Drop BDS comment regarding bdrv_append()Hanna Reitz
There is a comment above the BDS definition stating care must be taken to consider handling newly added fields in bdrv_append(). Actually, this comment should have said "bdrv_swap()" as of 4ddc07cac (nine years ago), and in any case, bdrv_swap() was dropped in 8e419aefa (six years ago). So no such care is necessary anymore. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210812084148.14458-2-hreitz@redhat.com>
2021-09-15gluster: Align block-status tailMax Reitz
gluster's block-status implementation is basically a copy of that in block/file-posix.c, there is only one thing missing, and that is aligning trailing data extents to the request alignment (as added by commit 9c3db310ff0). Note that 9c3db310ff0 mentions that "there seems to be no other block driver that sets request_alignment and [...]", but while block/gluster.c does indeed not set request_alignment, block/io.c's bdrv_refresh_limits() will still default to an alignment of 512 because block/gluster.c does not provide a byte-aligned read function. Therefore, unaligned tails can conceivably occur, and so we should apply the change from 9c3db310ff0 to gluster's block-status implementation. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210805143603.59503-1-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210914-4' ↵Peter Maydell
into staging Fix translation race condition for user-only. Fix tcg/i386 encoding for VPSLLVQ, VPSRLVQ. Fix tcg/arm tcg_out_vec_op signature. Fix tcg/ppc (32bit) build with clang. Remove dupluate TCG_KICK_PERIOD definition. Remove unused tcg_global_reg_new. Restrict cpu_exec_interrupt and its callees to sysemu. Cleanups for tcg/arm. # gpg: Signature made Tue 14 Sep 2021 20:28:35 BST # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth-gitlab/tags/pull-tcg-20210914-4: (43 commits) tcg/arm: More use of the TCGReg enum tcg/arm: More use of the ARMInsn enum tcg/arm: Give enum arm_cond_code_e a typedef and use it tcg/arm: Drop inline markers tcg/arm: Simplify usage of encode_imm tcg/arm: Split out tcg_out_ldstm tcg/arm: Support armv4t in tcg_out_goto and tcg_out_call tcg/arm: Simplify use_armv5t_instructions tcg/arm: Standardize on tcg_out_<branch>_{reg,imm} tcg/arm: Remove fallback definition of __ARM_ARCH accel/tcg/user-exec: Fix read-modify-write of code on s390 hosts user: Remove cpu_get_pic_interrupt() stubs accel/tcg: Restrict TCGCPUOps::cpu_exec_interrupt() to sysemu target/xtensa: Restrict cpu_exec_interrupt() handler to sysemu target/rx: Restrict cpu_exec_interrupt() handler to sysemu target/sparc: Restrict cpu_exec_interrupt() handler to sysemu target/sh4: Restrict cpu_exec_interrupt() handler to sysemu target/riscv: Restrict cpu_exec_interrupt() handler to sysemu target/ppc: Restrict cpu_exec_interrupt() handler to sysemu target/openrisc: Restrict cpu_exec_interrupt() handler to sysemu ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-14tcg/arm: More use of the TCGReg enumRichard Henderson
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14tcg/arm: More use of the ARMInsn enumRichard Henderson
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14tcg/arm: Give enum arm_cond_code_e a typedef and use itRichard Henderson
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14tcg/arm: Drop inline markersRichard Henderson
Let the compiler decide about inlining. Remove tcg_out_nop as unused. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14tcg/arm: Simplify usage of encode_immRichard Henderson
We have already computed the rotated value of the imm8 portion of the complete imm12 encoding. No sense leaving the combination of rot + rotation to the caller. Create an encode_imm12_nofail helper that performs an assert. This removes the final use of the local "rotl" function, which duplicated our generic "rol32" function. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14tcg/arm: Split out tcg_out_ldstmRichard Henderson
Expand these hard-coded instructions symbolically. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14tcg/arm: Support armv4t in tcg_out_goto and tcg_out_callRichard Henderson
ARMv4T has BX as its only interworking instruction. In order to support testing of different architecture revisions with a qemu binary that may have been built for, say ARMv6T2, fill in the blank required to make calls to helpers in thumb mode. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14tcg/arm: Simplify use_armv5t_instructionsRichard Henderson
According to the Arm ARM DDI 0406C, section A1.3, the valid variants are ARMv5T, ARMv5TE, ARMv5TEJ -- there is no ARMv5 without Thumb. Therefore simplify the test from preprocessor ifdefs to base architecture revision. Retain the "t" in the name to minimize churn. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14tcg/arm: Standardize on tcg_out_<branch>_{reg,imm}Richard Henderson
Some of the functions specified _reg, some _imm, and some left it blank. Make it clearer to which we are referring. Split tcg_out_b_reg from tcg_out_bx_reg, to indicate when we do not actually require BX semantics. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14tcg/arm: Remove fallback definition of __ARM_ARCHRichard Henderson
GCC since 4.8 provides the definition and we now require 7.5. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14accel/tcg/user-exec: Fix read-modify-write of code on s390 hostsIlya Leoshkevich
x86_64 dotnet/runtime uses cmpxchg for code patching. When running it under s390x qemu-linux user, cpu_signal_handler() does not recognize this as a write and does not restore PAGE_WRITE cleared by tb_page_add(), incorrectly forwarding the signal to the guest code. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210803221606.150103-1-iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14user: Remove cpu_get_pic_interrupt() stubsPhilippe Mathieu-Daudé
cpu_get_pic_interrupt() is now unreachable from user-mode, delete the unnecessary stubs. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Warner Losh <imp@bsdimp.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210911165434.531552-25-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-14accel/tcg: Restrict TCGCPUOps::cpu_exec_interrupt() to sysemuPhilippe Mathieu-Daudé
All targets call TCGCPUOps::cpu_exec_interrupt() from sysemu code. Move its declaration to restrict it to system emulation. Extend the code guarded. Restrict the static inlined need_replay_interrupt() method to avoid a "defined but not used" warning. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210911165434.531552-24-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>