aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-06-24contrib/plugins: add Instructions Per Second (IPS) example for cost modelingPierrick Bouvier
This plugin uses the new time control interface to make decisions about the state of time during the emulation. The algorithm is currently very simple. The user specifies an ips rate which applies per core. If the core runs ahead of its allocated execution time the plugin sleeps for a bit to let real time catch up. Either way time is updated for the emulation as a function of total executed instructions with some adjustments for cores that idle. Examples -------- Slow down execution of /bin/true: $ num_insn=$(./build/qemu-x86_64 -plugin ./build/tests/plugin/libinsn.so -d plugin /bin/true |& grep total | sed -e 's/.*: //') $ time ./build/qemu-x86_64 -plugin ./build/contrib/plugins/libips.so,ips=$(($num_insn/4)) /bin/true real 4.000s Boot a Linux kernel simulating a 250MHz cpu: $ /build/qemu-system-x86_64 -kernel /boot/vmlinuz-6.1.0-21-amd64 -append "console=ttyS0" -plugin ./build/contrib/plugins/libips.so,ips=$((250*1000*1000)) -smp 1 -m 512 check time until kernel panic on serial0 Tested in system mode by booting a full debian system, and using: $ sysbench cpu run Performance decrease linearly with the given number of ips. Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-Id: <20240530220610.1245424-7-pierrick.bouvier@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20240620152220.2192768-11-alex.bennee@linaro.org>
2024-06-24plugins: add migration blockerAlex Bennée
If the plugin in controlling time there is some state that might be missing from the plugin tracking it. Migration is unlikely to work in this case so lets put a migration blocker in to let the user know if they try. Suggested-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20240620152220.2192768-10-alex.bennee@linaro.org>
2024-06-24plugins: add time control APIAlex Bennée
Expose the ability to control time through the plugin API. Only one plugin can control time so it has to request control when loaded. There are probably more corner cases to catch here. Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> [AJB: tweaked user-mode handling, merged QEMU_PLUGIN_API fix] Message-Id: <20240530220610.1245424-6-pierrick.bouvier@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20240620152220.2192768-9-alex.bennee@linaro.org>
2024-06-24qtest: move qtest_{get, set}_virtual_clock to accel/qtest/qtest.cPierrick Bouvier
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20240530220610.1245424-5-pierrick.bouvier@linaro.org> Message-Id: <20240620152220.2192768-8-alex.bennee@linaro.org>
2024-06-24sysemu: generalise qtest_warp_clock as qemu_clock_advance_virtual_timeAlex Bennée
Move the key functionality of moving time forward into the clock sub-system itself. This will allow us to plumb in time control into plugins. Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20240620152220.2192768-7-alex.bennee@linaro.org>
2024-06-24qtest: use cpu interface in qtest_clock_warpAlex Bennée
This generalises the qtest_clock_warp code to use the AccelOps handlers for updating its own sense of time. This will make the next patch which moves the warp code closer to pure code motion. From: Alex Bennée <alex.bennee@linaro.org> Acked-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-Id: <20240530220610.1245424-3-pierrick.bouvier@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20240620152220.2192768-6-alex.bennee@linaro.org>
2024-06-24sysemu: add set_virtual_time to accel opsAlex Bennée
We are about to remove direct calls to individual accelerators for this information and will need a central point for plugins to hook into time changes. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240530220610.1245424-2-pierrick.bouvier@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20240620152220.2192768-5-alex.bennee@linaro.org>
2024-06-24plugins: Ensure register handles are not NULLAkihiko Odaki
Ensure register handles are not NULL so that a plugin can assume NULL is invalid as a register handle. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-Id: <20240229-null-v1-1-e716501d981e@daynix.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240620152220.2192768-4-alex.bennee@linaro.org>
2024-06-24gdbstub: move enums into separate headerAlex Bennée
This is an experiment to further reduce the amount we throw into the exec headers. It might not be as useful as I initially thought because just under half of the users also need gdbserver_start(). Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240620152220.2192768-3-alex.bennee@linaro.org>
2024-06-24include/exec: add missing include guard commentAlex Bennée
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240620152220.2192768-2-alex.bennee@linaro.org>
2024-06-22Merge tag 'pull-target-arm-20240622' of ↵Richard Henderson
https://git.linaro.org/people/pmaydell/qemu-arm into staging target-arm queue: * hw/net/can/xlnx-versal-canfd: Fix sorting of the tx queue * hw/arm/xilinx_zynq: Fix IRQ/FIQ routing * hw/intc/arm_gic: Fix deactivation of SPI lines * hw/timer/a9gtimer: Handle QTest mode in a9_gtimer_get_current_cpu * hw/misc: Set valid access size for Exynos4210 RNG * hw/arm/sbsa-ref: switch to 1GHz timer frequency * hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine * hw/arm/virt: allow creation of a second NonSecure UART * hw/arm/virt: Avoid unexpected warning from Linux guest on host with Fujitsu CPUs * scripts/coverity-scan/COMPONENTS.md: update component regexes * hw/usb/hcd-dwc2: Handle invalid address access in read and write functions * hw/usb/hcd-ohci: Fix ohci_service_td: accept zero-length TDs where CBP=BE+1 # -----BEGIN PGP SIGNATURE----- # # iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmZ2vigZHHBldGVyLm1h # eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3mRzD/9+Upo0E9GoNE8FaZYk+xw9 # tB7V0C5RxZCW74ggjsoRSs2Mq45X+jzjT5cmlo3bCyj9z146eyOovcqroJHlggy7 # W3nqE7Yg6tUz6MEbrDq54BVNGmBdwY4kpYr5MvXrhtb9A+/QjaW8MqlmT5NCvUb+ # KZ+i4PTAF5dALCZblnqL5+9RYfwMOeR8R03ZbV2H0OCvO16N1rWsgoRzReVbpmy2 # LEXGber13O7HnSRiMjvxTn92yZBO+tgmLB5w6V4aaYKEhj3B0wTO+GVEUMz0Rmzw # LunrZhtQql9MOrdJIvgPrrFRmGHamnNu3IV0750xrRPQ1mJlVevaaCpl1IlaVeXG # /PnY8HWaDJgwlPMDZVga38KSVQavdC8/Uvdw816a0rBzbclAAUZSNf8cuNeJ7qmk # 2CQp/C8vuarWH0Ut0Qav8uuepd5jDt5TT3crBPhxMRwxsNTsSgjXxe7s3jdVWe2C # +z1sC/KnSmmFUwyu14GA4WsUdz05m4Mmixz4unXemMeexibUA3n4RSTiUYzTNcb4 # NmhEY4WbhuDtnSqqeSFyKtS5WCIG9A8YmcEzHWNsbaZAIEdS5QlxCSocbzG2mO6G # zD/kWMn0nmYWejYgaT3LcL5BvkwmePV6u3jQNmVL8aQgG+OPZh7tvCR2gSMPWpml # Y2pVvKZ+Tcx3GqZOUqKsrA== # =oPnm # -----END PGP SIGNATURE----- # gpg: Signature made Sat 22 Jun 2024 05:06:00 AM PDT # gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE # gpg: issuer "peter.maydell@linaro.org" # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [full] # gpg: aka "Peter Maydell <pmaydell@gmail.com>" [full] # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [full] # gpg: aka "Peter Maydell <peter@archaic.org.uk>" [unknown] * tag 'pull-target-arm-20240622' of https://git.linaro.org/people/pmaydell/qemu-arm: hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine hw/usb/hcd-ohci: Fix ohci_service_td: accept zero-length TDs where CBP=BE+1 hw/misc: Set valid access size for Exynos4210 RNG hw/arm/virt: Avoid unexpected warning from Linux guest on host with Fujitsu CPUs hw/arm/virt: allow creation of a second NonSecure UART hw/arm/virt: Rename VIRT_UART and VIRT_SECURE_UART to VIRT_UART[01] hw/arm/virt: Add serial aliases in DTB hw/usb/hcd-dwc2: Handle invalid address access in read and write functions hw/timer/a9gtimer: Handle QTest mode in a9_gtimer_get_current_cpu scripts/coverity-scan/COMPONENTS.md: Include libqmp in testlibs scripts/coverity-scan/COMPONENTS.md: Fix monitor component scripts/coverity-scan/COMPONENTS.md: Add crypto headers in host/include to the crypto component scripts/coverity-scan/COMPONENTS.md: Fix 'char' component scripts/coverity-scan/COMPONENTS.md: Update paths to match gitlab CI hw/arm/xilinx_zynq: Fix IRQ/FIQ routing hw/intc/arm_gic: Fix deactivation of SPI lines hw/arm/sbsa-ref: switch to 1GHz timer frequency hw/net/can/xlnx-versal-canfd: Fix sorting of the tx queue Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-06-21Merge tag 'migration-20240621-pull-request' of ↵Richard Henderson
https://gitlab.com/farosas/qemu into staging Migration pull request - Fabiano's fix for fdset + file migration truncating the migration file - Fabiano's fdset + direct-io support for mapped-ram - Peter's various cleanups (multifd sync, thread names, migration states, tests) - Peter's new migration state postcopy-recover-setup - Philippe's unused vmstate macro cleanup # -----BEGIN PGP SIGNATURE----- # # iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmZ1vIsQHGZhcm9zYXNA # c3VzZS5kZQAKCRDHmNx0G+wxnVZTEACdFIsQ/PJw2C9eeLNor5B5MNSEqUjxX0KN # 6s/uTkJ/dcv+2PI92SzRCZ1dpR5e9AyjTFYbLc9tPRBIROEhlUaoc84iyEy0jCFU # eJ65/RQbH5QHRpOZwbN5RmGwnapfOWHGTn3bpdrmSQTOAy8R2TPGY4SVYR+gamTn # bAv1cAsrOOBUfCi8aqvSlmvuliOW0lzJdF4XHa3mAaigLoF14JdwUZdyIMP1mLDp # /fllbHCKCvJ1vprE9hQmptBR9PzveJZOZamIVt96djJr5+C869+9PMCn3a5vxqNW # b+/LhOZjac37Ecg5kgbq+cO1E4EXKC3zWOmDTw8kHUwp9oYNi1upwLdpHbAAZaQD # /JmHKsExx9QuV8mrVyGBXMI92E6RrT54b1Bjcuo63gAP8p9JRRxGT22U3LghNbTm # 1XcGPR3rswjT1yTgE6qAqAIMR+7X5MrJVWop9ub/lF5DQ1VYIwmlKSNdwDHFDhRq # 0F1k2+EksNpcZ0BH2+3iFml7qKHLVupLQKTWcLdrlnQnTfSG3+yW7eyA5Mte79Qp # nJPcHt8qBqUVQ9Uf/4490TM4Lrp+T+m16exIi0tISLaDXSVkFJnlowipSm+tQ7U3 # Sm68JWdWWEsXZVaMqJeBE8nA/hCoQDpo4hVdwftStI+NayXbRX/EgvPqrNAvwh+c # i4AdHdn6hQ== # =ZX0p # -----END PGP SIGNATURE----- # gpg: Signature made Fri 21 Jun 2024 10:46:51 AM PDT # gpg: using RSA key AA1B48B0A22326A5A4C364CFC798DC741BEC319D # gpg: issuer "farosas@suse.de" # gpg: Good signature from "Fabiano Rosas <farosas@suse.de>" [unknown] # gpg: aka "Fabiano Almeida Rosas <fabiano.rosas@suse.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: AA1B 48B0 A223 26A5 A4C3 64CF C798 DC74 1BEC 319D * tag 'migration-20240621-pull-request' of https://gitlab.com/farosas/qemu: (28 commits) migration: Remove unused VMSTATE_ARRAY_TEST() macro tests/migration-tests: Cover postcopy failure on reconnect tests/migration-tests: Verify postcopy-recover-setup status tests/migration-tests: migration_event_wait() tests/migration-tests: Always enable migration events tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure tests migration/docs: Update postcopy recover session for SETUP phase migration/postcopy: Add postcopy-recover-setup phase migration: Cleanup incoming migration setup state change migration: Use MigrationStatus instead of int migration: Rename thread debug names migration/multifd: Avoid the final FLUSH in complete() tests/qtest/migration: Add a test for mapped-ram with passing of fds migration: Add documentation for fdset with multifd + file monitor: fdset: Match against O_DIRECT tests/qtest/migration: Add tests for file migration with direct-io migration/multifd: Add direct-io support migration: Add direct-io parameter io: Stop using qemu_open_old in channel-file monitor: Report errors from monitor_fdset_dup_fd_add ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-06-21migration: Remove unused VMSTATE_ARRAY_TEST() macroPhilippe Mathieu-Daudé
Last use of VMSTATE_ARRAY_TEST() was removed in commit 46baa9007f ("migration/i386: Remove old non-softfloat 64bit FP support"), we can safely get rid of it. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21tests/migration-tests: Cover postcopy failure on reconnectPeter Xu
Make sure there will be an event for postcopy recovery, irrelevant of whether the reconnect will success, or when the failure happens. The added new case is to fail early in postcopy recovery, in which case it didn't even reach RECOVER stage on src (and in real life it'll be the same to dest, but the test case is just slightly more involved due to the dual socketpair setup). To do that, rename the postcopy_recovery_test_fail to reflect either stage to fail, instead of a boolean. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21tests/migration-tests: Verify postcopy-recover-setup statusPeter Xu
Making sure the postcopy-recover-setup status is present in the postcopy failure unit test. Note that it only applies to src QEMU not dest. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machineXiong Yining
Enable CPU cluster support on SbsaQemu platform, so that users can specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And this topology can be passed to the firmware through /cpus/topology Device Tree. Signed-off-by: Xiong Yining <xiongyining1480@phytium.com.cn> Reviewed-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org> Reviewed-by: Leif Lindholm <quic_llindhol@quicinc.com> Message-id: 20240607103825.1295328-2-xiongyining1480@phytium.com.cn Tested-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-06-21hw/usb/hcd-ohci: Fix ohci_service_td: accept zero-length TDs where CBP=BE+1David Hubbard
This changes the way the ohci emulation handles a Transfer Descriptor with "Buffer End" set to "Current Buffer Pointer" - 1, specifically in the case of a zero-length packet. The OHCI spec 4.3.1.2 Table 4-2 specifies td.cbp to be zero for a zero-length packet. Peter Maydell tracked down commit 1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables) where qemu started checking this according to the spec. What this patch does is loosen the qemu ohci implementation to allow a zero-length packet if td.be (Buffer End) is set to td.cbp - 1, and with a non-zero td.cbp value. The spec is unclear whether this is valid or not -- it is not the clearly documented way to send a zero length TD (which is CBP=BE=0), but it isn't specifically forbidden. Actual hw seems to be ok with it. Does any OS rely on this behavior? There have been no reports to qemu-devel of this problem. This is attempting to have qemu behave like actual hardware, but this is just a minor change. With a tiny OS[1] that boots and executes a test, the issue can be seen: * OS that sends USB requests to a USB mass storage device but sends td.cbp = td.be + 1 * qemu 4.2 * qemu HEAD (4e66a0854) * Actual OHCI controller (hardware) Command line: qemu-system-x86_64 -m 20 \ -device pci-ohci,id=ohci \ -drive if=none,format=raw,id=d,file=testmbr.raw \ -device usb-storage,bus=ohci.0,drive=d \ --trace "usb_*" --trace "ohci_*" -D qemu.log Results are: qemu 4.2 | qemu HEAD | actual HW -----------+------------+----------- works fine | ohci_die() | works fine Tip: if the flags "-serial pty -serial stdio" are added to the command line the test will output USB requests like this: Testing qemu HEAD: > Free mem 2M ohci port2 conn FS > setup { 80 6 0 1 0 0 8 0 } > ED info=80000 { mps=8 en=0 d=0 } tail=c20920 > td0 c20880 nxt=c20960 f2000000 setup cbp=c20900 be=c20907 > td1 c20960 nxt=c20980 f3140000 in cbp=c20908 be=c2090f > td2 c20980 nxt=c20920 f3080000 out cbp=c20910 be=c2090f ohci20 host err > usb stopped And in qemu.log: usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 > next_offset=0x00c2090f Testing qemu 4.2: > Free mem 2M ohci port2 conn FS > setup { 80 6 0 1 0 0 8 0 } > ED info=80000 { mps=8 en=0 d=0 } tail=620920 > td0 620880 nxt=620960 f2000000 setup cbp=620900 be=620907 cbp=0 be=620907 > td1 620960 nxt=620980 f3140000 in cbp=620908 be=62090f cbp=0 be=62090f > td2 620980 nxt=620920 f3080000 out cbp=620910 be=62090f cbp=0 be=62090f > rx { 12 1 0 2 0 0 0 8 } > setup { 0 5 1 0 0 0 0 0 } tx {} > ED info=80000 { mps=8 en=0 d=0 } tail=620880 > td0 620920 nxt=620960 f2000000 setup cbp=620900 be=620907 cbp=0 be=620907 > td1 620960 nxt=620880 f3100000 in cbp=620908 be=620907 cbp=0 be=620907 > setup { 80 6 0 1 0 0 12 0 } > ED info=80001 { mps=8 en=0 d=1 } tail=620960 > td0 620880 nxt=6209c0 f2000000 setup cbp=620920 be=620927 cbp=0 be=620927 > td1 6209c0 nxt=6209e0 f3140000 in cbp=620928 be=620939 cbp=0 be=620939 > td2 6209e0 nxt=620960 f3080000 out cbp=62093a be=620939 cbp=0 be=620939 > rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 } > setup { 80 6 0 2 0 0 0 1 } > ED info=80001 { mps=8 en=0 d=1 } tail=620880 > td0 620960 nxt=6209a0 f2000000 setup cbp=620a20 be=620a27 cbp=0 be=620a27 > td1 6209a0 nxt=6209c0 f3140004 in cbp=620a28 be=620b27 cbp=620a48 be=620b27 > td2 6209c0 nxt=620880 f3080000 out cbp=620b28 be=620b27 cbp=0 be=620b27 > rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2 40 0 0 } > setup { 0 9 1 0 0 0 0 0 } tx {} > ED info=80001 { mps=8 en=0 d=1 } tail=620900 > td0 620880 nxt=620940 f2000000 setup cbp=620a00 be=620a07 cbp=0 be=620a07 > td1 620940 nxt=620900 f3100000 in cbp=620a08 be=620a07 cbp=0 be=620a07 [1] The OS disk image has been emailed to philmd@linaro.org, mjt@tls.msk.ru, and kraxel@redhat.com: * testCbpOffBy1.img.xz * sha256: f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed Signed-off-by: David Hubbard <dmamfmgm@gmail.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-06-21hw/misc: Set valid access size for Exynos4210 RNGZheyu Ma
The Exynos4210 RNG module requires 32-bit (4-byte) accesses to its registers. According to the User Manual Section 25.3[1], the registers for RNG operations are 32-bit. This change ensures that the memory region operations for the RNG module enforce the correct access sizes, preventing invalid memory accesses. [1] http://www.mediafire.com/view/8ly2fqls3c9c31c/Exynos_4412_SCP_Users_Manual_Ver.0.10.00_Preliminary0.pdf Reproducer: cat << EOF | qemu-system-aarch64 -display none \ -machine accel=qtest, -m 512M -machine smdkc210 -qtest stdio readb 0x10830454 EOF Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Message-id: 20240618163701.3204975-1-zheyuma97@gmail.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-06-21hw/arm/virt: Avoid unexpected warning from Linux guest on host with Fujitsu CPUsZhenyu Zhang
Multiple warning messages and corresponding backtraces are observed when Linux guest is booted on the host with Fujitsu CPUs. One of them is shown as below. [ 0.032443] ------------[ cut here ]------------ [ 0.032446] uart-pl011 9000000.pl011: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (128 < 256) [ 0.032454] WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:54 arch_setup_dma_ops+0xbc/0xcc [ 0.032470] Modules linked in: [ 0.032475] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-452.el9.aarch64 [ 0.032481] Hardware name: linux,dummy-virt (DT) [ 0.032484] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.032490] pc : arch_setup_dma_ops+0xbc/0xcc [ 0.032496] lr : arch_setup_dma_ops+0xbc/0xcc [ 0.032501] sp : ffff80008003b860 [ 0.032503] x29: ffff80008003b860 x28: 0000000000000000 x27: ffffaae4b949049c [ 0.032510] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 0.032517] x23: 0000000000000100 x22: 0000000000000000 x21: 0000000000000000 [ 0.032523] x20: 0000000100000000 x19: ffff2f06c02ea400 x18: ffffffffffffffff [ 0.032529] x17: 00000000208a5f76 x16: 000000006589dbcb x15: ffffaae4ba071c89 [ 0.032535] x14: 0000000000000000 x13: ffffaae4ba071c84 x12: 455f525443206e61 [ 0.032541] x11: 68742072656c6c61 x10: 0000000000000029 x9 : ffffaae4b7d21da4 [ 0.032547] x8 : 0000000000000029 x7 : 4c414e494d5f414d x6 : 0000000000000029 [ 0.032553] x5 : 000000000000000f x4 : ffffaae4b9617a00 x3 : 0000000000000001 [ 0.032558] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff2f06c029be40 [ 0.032564] Call trace: [ 0.032566] arch_setup_dma_ops+0xbc/0xcc [ 0.032572] of_dma_configure_id+0x138/0x300 [ 0.032591] amba_dma_configure+0x34/0xc0 [ 0.032600] really_probe+0x78/0x3dc [ 0.032614] __driver_probe_device+0x108/0x160 [ 0.032619] driver_probe_device+0x44/0x114 [ 0.032624] __device_attach_driver+0xb8/0x14c [ 0.032629] bus_for_each_drv+0x88/0xe4 [ 0.032634] __device_attach+0xb0/0x1e0 [ 0.032638] device_initial_probe+0x18/0x20 [ 0.032643] bus_probe_device+0xa8/0xb0 [ 0.032648] device_add+0x4b4/0x6c0 [ 0.032652] amba_device_try_add.part.0+0x48/0x360 [ 0.032657] amba_device_add+0x104/0x144 [ 0.032662] of_amba_device_create.isra.0+0x100/0x1c4 [ 0.032666] of_platform_bus_create+0x294/0x35c [ 0.032669] of_platform_populate+0x5c/0x150 [ 0.032672] of_platform_default_populate_init+0xd0/0xec [ 0.032697] do_one_initcall+0x4c/0x2e0 [ 0.032701] do_initcalls+0x100/0x13c [ 0.032707] kernel_init_freeable+0x1c8/0x21c [ 0.032712] kernel_init+0x28/0x140 [ 0.032731] ret_from_fork+0x10/0x20 [ 0.032735] ---[ end trace 0000000000000000 ]--- In Linux, a check is applied to every device which is exposed through device-tree node. The warning message is raised when the device isn't DMA coherent and the cache line size is larger than ARCH_DMA_MINALIGN (128 bytes). The cache line is sorted from CTR_EL0[CWG], which corresponds to 256 bytes on the guest CPUs. The DMA coherent capability is claimed through 'dma-coherent' in their device-tree nodes or parent nodes. This happens even when the device doesn't implement or use DMA at all, for legacy reasons. Fix the issue by adding 'dma-coherent' property to the device-tree root node, meaning all devices are capable of DMA coherent by default. This both suppresses the spurious kernel warnings and also guards against possible future QEMU bugs where we add a DMA-capable device and forget to mark it as dma-coherent. Signed-off-by: Zhenyu Zhang <zhenyzha@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-id: 20240612020506.307793-1-zhenyzha@redhat.com [PMM: tweaked commit message] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-06-21hw/arm/virt: allow creation of a second NonSecure UARTPeter Maydell
For some use-cases, it is helpful to have more than one UART available to the guest. If the second UART slot is not already used for a TrustZone Secure-World-only UART, create it as a NonSecure UART only when the user provides a serial backend (e.g. via a second -serial command line option). This avoids problems where existing guest software only expects a single UART, and gets confused by the second UART in the DTB. The major example of this is older EDK2 firmware, which will send the GRUB bootloader output to UART1 and the guest serial output to UART0. Users who want to use both UARTs with a guest setup including EDK2 are advised to update to EDK2 release edk2-stable202311 or newer. (The prebuilt EDK2 blobs QEMU upstream provides are new enough.) The relevant EDK2 changes are the ones described here: https://bugzilla.tianocore.org/show_bug.cgi?id=4577 Inspired-by: Axel Heider <axel.heider@hensoldt.net> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Tested-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240610162343.2131524-4-peter.maydell@linaro.org
2024-06-21hw/arm/virt: Rename VIRT_UART and VIRT_SECURE_UART to VIRT_UART[01]Peter Maydell
We're going to make the second UART not always a secure-only device. Rename the constants VIRT_UART and VIRT_SECURE_UART to VIRT_UART0 and VIRT_UART1 accordingly. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240610162343.2131524-3-peter.maydell@linaro.org
2024-06-21hw/arm/virt: Add serial aliases in DTBPeter Maydell
If there is more than one UART in the DTB, then there is no guarantee on which order a guest is supposed to initialise them. The standard solution to this is "serialN" entries in the "/aliases" node of the dtb which give the nodename of the UARTs. At the moment we only have two UARTs in the DTB when one is for the Secure world and one for the Non-Secure world, so this isn't really a problem. However if we want to add a second NS UART we'll need the aliases to ensure guests pick the right one. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240610162343.2131524-2-peter.maydell@linaro.org
2024-06-21hw/usb/hcd-dwc2: Handle invalid address access in read and write functionsZheyu Ma
This commit modifies the dwc2_hsotg_read() and dwc2_hsotg_write() functions to handle invalid address access gracefully. Instead of using g_assert_not_reached(), which causes the program to abort, the functions now log an error message and return a default value for reads or do nothing for writes. This change prevents the program from aborting and provides clear log messages indicating when an invalid memory address is accessed. Reproducer: cat << EOF | qemu-system-aarch64 -display none \ -machine accel=qtest, -m 512M -machine raspi2b -m 1G -nodefaults \ -usb -drive file=null-co://,if=none,format=raw,id=disk0 -device \ usb-storage,port=1,drive=disk0 -qtest stdio readl 0x3f980dfb EOF Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Reviewed-by: Paul Zimmerman <pauldzim@gmail.com> Message-id: 20240618135610.3109175-1-zheyuma97@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-06-21hw/timer/a9gtimer: Handle QTest mode in a9_gtimer_get_current_cpuZheyu Ma
This commit updates the a9_gtimer_get_current_cpu() function to handle cases where QTest is enabled. When QTest is used, it returns 0 instead of dereferencing the current_cpu, which can be NULL. This prevents the program from crashing during QTest runs. Reproducer: cat << EOF | qemu-system-aarch64 -display \ none -machine accel=qtest, -m 512M -machine npcm750-evb -qtest stdio writel 0xf03fe20c 0x26d7468c EOF Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240618144009.3137806-1-zheyuma97@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-06-21scripts/coverity-scan/COMPONENTS.md: Include libqmp in testlibsPeter Maydell
Add libqmp to the testlibs component. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240604145934.1230583-6-peter.maydell@linaro.org
2024-06-21scripts/coverity-scan/COMPONENTS.md: Fix monitor componentPeter Maydell
Update the 'monitor' component: * qapi/ and monitor/ are now subdirectories * add job-qmp.c Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240604145934.1230583-5-peter.maydell@linaro.org
2024-06-21scripts/coverity-scan/COMPONENTS.md: Add crypto headers in host/include to ↵Peter Maydell
the crypto component host/include/*/host/crypto/ are relatively new headers; add them to the crypto component. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240604145934.1230583-4-peter.maydell@linaro.org
2024-06-21scripts/coverity-scan/COMPONENTS.md: Fix 'char' componentPeter Maydell
The 'char' component: * includes the no-longer-present qemu-char.c, which has been long since split into the chardev/ backend code * also includes the hw/char devices Split it into two components: * char is the hw/char devices * chardev is the chardev backends with regexes matching our current sources. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240604145934.1230583-3-peter.maydell@linaro.org
2024-06-21scripts/coverity-scan/COMPONENTS.md: Update paths to match gitlab CIPeter Maydell
Since commit 83aa1baa069c we have been running the build for Coverity Scan as a Gitlab CI job, rather than the old setup where it was run on a local developer's machine. This is working well, but the absolute paths of files are different for the Gitlab CI job, which means that the regexes we use to identify Coverity components no longer work. With Gitlab CI builds the file paths are of the form /builds/qemu-project/qemu/accel/kvm/kvm-all.c rather than the old /qemu/accel/kvm/kvm-all.c and our regexes all don't match. Update all the regexes to start with .*/qemu/ . This will hopefully avoid the need to change them again in future if the build path changes again. This change was made with a search-and-replace of (/qemu)? to .*/qemu . Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240604145934.1230583-2-peter.maydell@linaro.org
2024-06-21hw/arm/xilinx_zynq: Fix IRQ/FIQ routingSebastian Huber
Fix the system bus interrupt line to CPU core assignment. Fixes: ddcf58e044ce0 ("hw/arm/xilinx_zynq: Support up to two CPU cores") Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240610052906.4432-1-sebastian.huber@embedded-brains.de Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-06-21hw/intc/arm_gic: Fix deactivation of SPI linesEdgar E. Iglesias
Julien reported that he has seen strange behaviour when running Xen on QEMU using GICv2. When Xen migrates a guest's vCPU from one pCPU to another while the vCPU is handling an interrupt, the guest is unable to properly deactivate interrupts. Looking at it a little closer, our GICv2 model treats deactivation of SPI lines as if they were PPI's, i.e banked per CPU core. The state for active interrupts should only be banked for PPI lines, not for SPI lines. Make deactivation of SPI lines unbanked, similar to how we handle writes to GICD_ICACTIVER. Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com> Message-id: 20240605143044.2029444-2-edgar.iglesias@gmail.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-06-21hw/arm/sbsa-ref: switch to 1GHz timer frequencyMarcin Juszkiewicz
Updated firmware for QEMU CI is already in merge queue so we can move platform to be future proof. All supported cpus work fine with 1GHz timer frequency when firmware is fresh enough. Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org> Reviewed-by: Leif Lindholm <quic_llindhol@quicinc.com> Message-id: 20240531093729.220758-2-marcin.juszkiewicz@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-06-21hw/net/can/xlnx-versal-canfd: Fix sorting of the tx queueShiva sagar Myana
Returning an uint32_t casted to a gint from g_cmp_ids causes the tx queue to become wrongly sorted when executing g_slist_sort. Fix this by always returning -1 or 1 from g_cmp_ids based on the ID comparison instead. Also, if two message IDs are the same, sort them by using their index and transmit the message at the lowest index first. Signed-off-by: Shiva sagar Myana <Shivasagar.Myana@amd.com> Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com> Message-id: 20240603051732.3334571-1-Shivasagar.Myana@amd.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-06-21tests/migration-tests: migration_event_wait()Peter Xu
Introduce a small helper to wait for a migration event, generalized from the incoming migration path. Make the helper easier to use by allowing it to keep waiting until the expected event is received. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21tests/migration-tests: Always enable migration eventsPeter Xu
Libvirt should always enable it, so it'll be nice qtest also cover that for all tests on both sides. migrate_incoming_qmp() used to enable it only on dst, now we enable them on both, as we'll start to sanity check events even on the src QEMU. We'll need to leave the one in migrate_incoming_qmp(), because virtio-net-failover test uses that one only, and it relies on the events to work. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure testsPeter Xu
Most of them are not needed, we can stick with one ifdef inside postcopy_recover_fail() so as to cover the scm right tricks only. The tests won't run on windows anyway due to has_uffd always false. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21migration/docs: Update postcopy recover session for SETUP phasePeter Xu
Firstly, the "Paused" state was added in the wrong place before. The state machine section was describing PostcopyState, rather than MigrationStatus. Drop the Paused state descriptions. Then in the postcopy recover session, add more information on the state machine for MigrationStatus in the lines. Add the new RECOVER_SETUP phase. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> [fix typo s/reconnects/reconnect] Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21migration/postcopy: Add postcopy-recover-setup phasePeter Xu
This patch adds a migration state on src called "postcopy-recover-setup". The new state will describe the intermediate step starting from when the src QEMU received a postcopy recovery request, until the migration channels are properly established, but before the recovery process take place. The request came from Libvirt where Libvirt currently rely on the migration state events to detect migration state changes. That works for most of the migration process but except postcopy recovery failures at the beginning. Currently postcopy recovery only has two major states: - postcopy-paused: this is the state that both sides of QEMU will be in for a long time as long as the migration channel was interrupted. - postcopy-recover: this is the state where both sides of QEMU handshake with each other, preparing for a continuation of postcopy which used to be interrupted. The issue here is when the recovery port is invalid, the src QEMU will take the URI/channels, noticing the ports are not valid, and it'll silently keep in the postcopy-paused state, with no event sent to Libvirt. In this case, the only thing Libvirt can do is to poll the migration status with a proper interval, however that's less optimal. Considering that this is the only case where Libvirt won't get a notification from QEMU on such events, let's add postcopy-recover-setup state to mimic what we have with the "setup" state of a newly initialized migration, describing the phase of connection establishment. With that, postcopy recovery will have two paths to go now, and either path will guarantee an event generated. Now the events will look like this during a recovery process on src QEMU: - Initially when the recovery is initiated on src, QEMU will go from "postcopy-paused" -> "postcopy-recover-setup". Old QEMUs don't have this event. - Depending on whether the channel re-establishment is succeeded: - In succeeded case, src QEMU will move from "postcopy-recover-setup" to "postcopy-recover". Old QEMUs also have this event. - In failure case, src QEMU will move from "postcopy-recover-setup" to "postcopy-paused" again. Old QEMUs don't have this event. This guarantees that Libvirt will always receive a notification for recovery process properly. One thing to mention is, such new status is only needed on src QEMU not both. On dest QEMU, the state machine doesn't change. Hence the events don't change either. It's done like so because dest QEMU may not have an explicit point of setup start. E.g., it can happen that when dest QEMUs doesn't use migrate-recover command to use a new URI/channel, but the old URI/channels can be reused in recovery, in which case the old ports simply can work again after the network routes are fixed up. Add a new helper postcopy_is_paused() detecting whether postcopy is still paused, taking RECOVER_SETUP into account too. When using it on both src/dst, a slight change is done altogether to always wait for the semaphore before checking the status, because for both sides a sem_post() will be required for a recovery. Cc: Jiri Denemark <jdenemar@redhat.com> Cc: Prasad Pandit <ppandit@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Buglink: https://issues.redhat.com/browse/RHEL-38485 Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21migration: Cleanup incoming migration setup state changePeter Xu
Destination QEMU can setup incoming ports for two purposes: either a fresh new incoming migration, in which QEMU will switch to SETUP for channel establishment, or a paused postcopy migration, in which QEMU will stay in POSTCOPY_PAUSED until kicking off the RECOVER phase. Now the state machine worked on dest node for the latter, only because migrate_set_state() implicitly will become a noop if the current state check failed. It wasn't clear at all. Clean it up by providing a helper migration_incoming_state_setup() doing proper checks over current status. Postcopy-paused will be explicitly checked now, and then we can bail out for unknown states. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21migration: Use MigrationStatus instead of intPeter Xu
QEMU uses "int" in most cases even if it stores MigrationStatus. I don't know why, so let's try to do that right and see what blows up.. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21migration: Rename thread debug namesPeter Xu
The postcopy thread names on dest QEMU are slightly confusing, partly I'll need to blame myself on 36f62f11e4 ("migration: Postcopy preemption preparation on channel creation"). E.g., "fault-fast" reads like a fast version of "fault-default", but it's actually the fast version of "postcopy/listen". Taking this chance, rename all the migration threads with proper rules. Considering we only have 15 chars usable, prefix all threads with "mig/", meanwhile identify src/dst threads properly this time. So now most thread names will look like "mig/DIR/xxx", where DIR will be "src"/"dst", except the bg-snapshot thread which doesn't have a direction. For multifd threads, making them "mig/{src|dst}/{send|recv}_%d". We used to have "live_migration" thread for a very long time, now it's called "mig/src/main". We may hope to have "mig/dst/main" soon but not yet. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21migration/multifd: Avoid the final FLUSH in complete()Peter Xu
We always do the flush when finishing one round of scan, and during complete() phase we should scan one more round making sure no dirty page existed. In that case we shouldn't need one explicit FLUSH at the end of complete(), as when reaching there all pages should have been flushed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21tests/qtest/migration: Add a test for mapped-ram with passing of fdsFabiano Rosas
Add a multifd test for mapped-ram with passing of fds into QEMU. This is how libvirt will consume the feature. There are a couple of details to the fdset mechanism: - multifd needs two distinct file descriptors (not duplicated with dup()) so it can enable O_DIRECT only on the channels that do aligned IO. The dup() system call creates file descriptors that share status flags, of which O_DIRECT is one. - the open() access mode flags used for the fds passed into QEMU need to match the flags QEMU uses to open the file. Currently O_WRONLY for src and O_RDONLY for dst. Note that fdset code goes under _WIN32 because fd passing is not supported on Windows. Reviewed-by: Peter Xu <peterx@redhat.com> [brought back the qmp_remove_fd() call at the end of the tests] Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21migration: Add documentation for fdset with multifd + fileFabiano Rosas
With the last few changes to the fdset infrastructure, we now allow multifd to use an fdset when migrating to a file. This is useful for the scenario where the management layer wants to have control over the migration file. By receiving the file descriptors directly, QEMU can delegate some high level operating system operations to the management layer (such as mandatory access control). The management layer might also want to add its own headers before the migration stream. Document the "file:/dev/fdset/#" syntax for the multifd migration with mapped-ram. The requirements for the fdset mechanism are: - the fdset must contain two fds that are not duplicates between themselves; - if direct-io is to be used, exactly one of the fds must have the O_DIRECT flag set; - the file must be opened with WRONLY on the migration source side; - the file must be opened with RDONLY on the migration destination side. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21monitor: fdset: Match against O_DIRECTFabiano Rosas
We're about to enable the use of O_DIRECT in the migration code and due to the alignment restrictions imposed by filesystems we need to make sure the flag is only used when doing aligned IO. The migration will do parallel IO to different regions of a file, so we need to use more than one file descriptor. Those cannot be obtained by duplicating (dup()) since duplicated file descriptors share the file status flags, including O_DIRECT. If one migration channel does unaligned IO while another sets O_DIRECT to do aligned IO, the filesystem would fail the unaligned operation. The add-fd QMP command along with the fdset code are specifically designed to allow the user to pass a set of file descriptors with different access flags into QEMU to be later fetched by code that needs to alternate between those flags when doing IO. Extend the fdset matching to behave the same with the O_DIRECT flag. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21tests/qtest/migration: Add tests for file migration with direct-ioFabiano Rosas
The tests are only allowed to run in systems that know about the O_DIRECT flag and in filesystems which support it. Note: this also brings back migrate_set_parameter_bool() which went away when we removed the compression tests. I copied it verbatim. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21migration/multifd: Add direct-io supportFabiano Rosas
When multifd is used along with mapped-ram, we can take benefit of a filesystem that supports the O_DIRECT flag and perform direct I/O in the multifd threads. This brings a significant performance improvement because direct-io writes bypass the page cache which would otherwise be thrashed by the multifd data which is unlikely to be needed again in a short period of time. To be able to use a multifd channel opened with O_DIRECT, we must ensure that a certain aligment is used. Filesystems usually require a block-size alignment for direct I/O. The way to achieve this is by enabling the mapped-ram feature, which already aligns its I/O properly (see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c). By setting O_DIRECT on the multifd channels, all writes to the same file descriptor need to be aligned as well, even the ones that come from outside multifd, such as the QEMUFile I/O from the main migration code. This makes it impossible to use the same file descriptor for the QEMUFile and for the multifd channels. The various flags and metadata written by the main migration code will always be unaligned by virtue of their small size. To workaround this issue, we'll require a second file descriptor to be used exclusively for direct I/O. The second file descriptor can be obtained by QEMU by re-opening the migration file (already possible), or by being provided by the user or management application (support to be added in future patches). Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21migration: Add direct-io parameterFabiano Rosas
Add the direct-io migration parameter that tells the migration code to use O_DIRECT when opening the migration stream file whenever possible. This is currently only used with the mapped-ram migration that has a clear window guaranteed to perform aligned writes. Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21io: Stop using qemu_open_old in channel-fileFabiano Rosas
We want to make use of the Error object to report fdset errors from qemu_open_internal() and passing the error pointer to qemu_open_old() would require changing all callers. Move the file channel to the new API instead. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21monitor: Report errors from monitor_fdset_dup_fd_addFabiano Rosas
I'm keeping the EACCES because callers expect to be able to look at errno. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>