aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-089pfs: prevent opening special files (CVE-2023-2861)Christian Schoenebeck
The 9p protocol does not specifically define how server shall behave when client tries to open a special file, however from security POV it does make sense for 9p server to prohibit opening any special file on host side in general. A sane Linux 9p client for instance would never attempt to open a special file on host side, it would always handle those exclusively on its guest side. A malicious client however could potentially escape from the exported 9p tree by creating and opening a device file on host side. With QEMU this could only be exploited in the following unsafe setups: - Running QEMU binary as root AND 9p 'local' fs driver AND 'passthrough' security model. or - Using 9p 'proxy' fs driver (which is running its helper daemon as root). These setups were already discouraged for safety reasons before, however for obvious reasons we are now tightening behaviour on this. Fixes: CVE-2023-2861 Reported-by: Yanwu Shen <ywsPlz@gmail.com> Reported-by: Jietao Xiao <shawtao1125@gmail.com> Reported-by: Jinku Li <jkli@xidian.edu.cn> Reported-by: Wenbo Shen <shenwenbo@zju.edu.cn> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Message-Id: <E1q6w7r-0000Q0-NM@lizzy.crudebyte.com> (cherry picked from commit f6b0de53fb87ddefed348a39284c8e2f28dc4eda) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: drop adding qemu_fstat wrapper for 7.2 where wrappers aren't used)
2023-06-08qga: Fix suspend on Linux guests without systemdMark Somerville
Allow the Linux guest agent to attempt each of the suspend methods (systemctl, pm-* and writing to /sys) in turn. Prior to this guests without systemd failed to suspend due to `guest_suspend` returning early regardless of the return value of `systemd_supports_mode`. Signed-off-by: Mark Somerville <mark@qpok.net> Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com> Signed-off-by: Konstantin Kostiuk <kkostiuk@redhat.com> (cherry picked from commit 86dcb6ab9b603450eb6d896cdc95286de2c7d561) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-06-07docs: fix multi-process QEMU documentationJagannathan Raman
Fix a typo in the system documentation for multi-process QEMU. Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> (cherry picked from commit 7771e8b86335968ee46538d1afd44246e7a062bc) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-06-07s390x/tcg: Fix CPU address returned by STIDPIlya Leoshkevich
In qemu-user-s390x, /proc/cpuinfo contains: processor 0: version = 00, identification = 000000, machine = 8561 processor 1: version = 00, identification = 400000, machine = 8561 The highest nibble is supposed to contain the CPU address, but it's off by 2 bits. Fix the shift value and provide a symbolic constant for it. With the fix we get: processor 0: version = 00, identification = 000000, machine = 8561 processor 1: version = 00, identification = 100000, machine = 8561 Fixes: 076d4d39b65f ("s390x/cpumodel: wire up cpu type + id for TCG") Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-Id: <20230605113950.1169228-2-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> (cherry picked from commit 71b11cbe1c34411238703abe24bfaf2e9712c30d) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-06-07tests/tcg/s390x: Test single-stepping SVCIlya Leoshkevich
Add a small test to prevent regressions. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20230510230213.330134-3-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> (cherry picked from commit be4a4cb429617a8b6893733b37b6203e4b7bf35b) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-06-07linux-user/s390x: Fix single-stepping SVCIlya Leoshkevich
Currently single-stepping SVC executes two instructions. The reason is that EXCP_DEBUG for the SVC instruction itself is masked by EXCP_SVC. Fix by re-raising EXCP_DEBUG. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-Id: <20230510230213.330134-2-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> (cherry picked from commit 01b9990a3fb84bb9a14017255ab1a4fa86588215) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-06-07tests/tcg/s390x: Test LOCFHRIlya Leoshkevich
Add a small test to prevent regressions. Cc: qemu-stable@nongnu.org Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-Id: <20230526181240.1425579-5-iii@linux.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> (cherry picked from commit 230976232f4fcdc205d6ec53ec9f3804b28dc1e7) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-06-07target/s390x: Fix LOCFHR taking the wrong half of R2Ilya Leoshkevich
LOCFHR should write top-to-top, but QEMU erroneously writes bottom-to-top. Fixes: 45aa9aa3b773 ("target/s390x: Implement load-on-condition-2 insns") Cc: qemu-stable@nongnu.org Reported-by: Mikhail Mitskevich <mitskevichmn@gmail.com> Closes: https://gitlab.com/qemu-project/qemu/-/issues/1668 Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-Id: <20230526181240.1425579-4-iii@linux.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> (cherry picked from commit 3180b173621021c365c256cedf2f5845bd4780d0) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-06-07tests/tcg/s390x: Test LCBBIlya Leoshkevich
Add a test to prevent regressions. Cc: qemu-stable@nongnu.org Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-Id: <20230526181240.1425579-3-iii@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com> (cherry picked from commit 05d000fb4dcac4bc02ffa08fcf14b51683b878f6) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-06-07target/s390x: Fix LCBB overwriting the top 32 bitsIlya Leoshkevich
LCBB is supposed to overwrite only the bottom 32 bits, but QEMU erroneously overwrites the entire register. Fixes: 6d9303322ed9 ("s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY") Cc: qemu-stable@nongnu.org Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-Id: <20230526181240.1425579-2-iii@linux.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> (cherry picked from commit 079181b9bc60389e106009a1530d3cc42256f567) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-31target/arm: Explicitly select short-format FSR for M-profilePeter Maydell
For M-profile, there is no guest-facing A-profile format FSR, but we still use the env->exception.fsr field to pass fault information from the point where a fault is raised to the code in arm_v7m_cpu_do_interrupt() which interprets it and sets the M-profile specific fault status registers. So it doesn't matter whether we fill in env->exception.fsr in the short format or the LPAE format, as long as both sides agree. As it happens arm_v7m_cpu_do_interrupt() assumes short-form. In compute_fsr_fsc() we weren't explicitly choosing short-form for M-profile, but instead relied on it falling out in the wash because arm_s1_regime_using_lpae_format() would be false. This was broken in commit 452c67a4 when we added v8R support, because we said "PMSAv8 is always LPAE format" (as it is for v8R), forgetting that we were implicitly using this code path on M-profile. At that point we would hit a g_assert_not_reached(): ERROR:../../target/arm/internals.h:549:arm_fi_to_lfsc: code should not be reached #7 0x0000555555e055f7 in arm_fi_to_lfsc (fi=0x7fffecff9a90) at ../../target/arm/internals.h:549 #8 0x0000555555e05a27 in compute_fsr_fsc (env=0x555557356670, fi=0x7fffecff9a90, target_el=1, mmu_idx=1, ret_fsc=0x7fffecff9a1c) at ../../target/arm/tlb_helper.c:95 #9 0x0000555555e05b62 in arm_deliver_fault (cpu=0x555557354800, addr=268961344, access_type=MMU_INST_FETCH, mmu_idx=1, fi=0x7fffecff9a90) at ../../target/arm/tlb_helper.c:132 #10 0x0000555555e06095 in arm_cpu_tlb_fill (cs=0x555557354800, address=268961344, size=1, access_type=MMU_INST_FETCH, mmu_idx=1, probe=false, retaddr=0) at ../../target/arm/tlb_helper.c:260 The specific assertion changed when commit fcc7404eff24b4c added "assert not M-profile" to arm_is_secure_below_el3(), because the conditions being checked in compute_fsr_fsc() include arm_el_is_aa64(), which will end up calling arm_is_secure_below_el3() and asserting before we try to call arm_fi_to_lfsc(): #7 0x0000555555efaf43 in arm_is_secure_below_el3 (env=0x5555574665a0) at ../../target/arm/cpu.h:2396 #8 0x0000555555efb103 in arm_is_el2_enabled (env=0x5555574665a0) at ../../target/arm/cpu.h:2448 #9 0x0000555555efb204 in arm_el_is_aa64 (env=0x5555574665a0, el=1) at ../../target/arm/cpu.h:2509 #10 0x0000555555efbdfd in compute_fsr_fsc (env=0x5555574665a0, fi=0x7fffecff99e0, target_el=1, mmu_idx=1, ret_fsc=0x7fffecff996c) Avoid the assertion and the incorrect FSR format selection by explicitly making M-profile use the short-format in this function. Fixes: 452c67a42704 ("target/arm: Enable TTBCR_EAE for ARMv8-R AArch32")a Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1658 Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230523131726.866635-1-peter.maydell@linaro.org (cherry picked from commit d7fe699be54b2cbb8e4ee37b63588b3458a49da7) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-31hw/arm/xlnx-zynqmp: fix unsigned error when checking the RPUs numberClément Chigot
When passing --smp with a number lower than XLNX_ZYNQMP_NUM_APU_CPUS, the expression (ms->smp.cpus - XLNX_ZYNQMP_NUM_APU_CPUS) will result in a positive number as ms->smp.cpus is a unsigned int. This will raise the following error afterwards, as Qemu will try to instantiate some additional RPUs. | $ qemu-system-aarch64 --smp 1 -M xlnx-zcu102 | ** | ERROR:../src/tcg/tcg.c:777:tcg_register_thread: | assertion failed: (n < tcg_max_ctxs) Signed-off-by: Clément Chigot <chigot@adacore.com> Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com> Tested-by: Francisco Iglesias <frasse.iglesias@gmail.com> Message-id: 20230524143714.565792-1-chigot@adacore.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> (cherry picked from commit c9ba1c9f02cfede5329f504cdda6fd3a256e0434) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-31hw/dma/xilinx_axidma: Check DMASR.HALTED to prevent infinite loop.Tommy Wu
When we receive a packet from the xilinx_axienet and then try to s2mem through the xilinx_axidma, if the descriptor ring buffer is full in the xilinx axidma driver, we’ll assert the DMASR.HALTED in the function : stream_process_s2mem and return 0. In the end, we’ll be stuck in an infinite loop in axienet_eth_rx_notify. This patch checks the DMASR.HALTED state when we try to push data from xilinx axi-enet to xilinx axi-dma. When the DMASR.HALTED is asserted, we will not keep pushing the data and then prevent the infinte loop. Signed-off-by: Tommy Wu <tommy.wu@sifive.com> Reviewed-by: Edgar E. Iglesias <edgar@zeroasic.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Message-id: 20230519062137.1251741-1-tommy.wu@sifive.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> (cherry picked from commit 31afe04586efeccb80cc36ffafcd0e32a3245ffb) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-31ui/sdl2: disable SDL_HINT_GRAB_KEYBOARD on WindowsVolker Rümelin
Windows sends an extra left control key up/down input event for every right alt key up/down input event for keyboards with international layout. Since commit 830473455f ("ui/sdl2: fix handling of AltGr key on Windows") QEMU uses a Windows low level keyboard hook procedure to reliably filter out the special left control key and to grab the keyboard on Windows. The SDL2 version 2.0.16 introduced its own Windows low level keyboard hook procedure to grab the keyboard. Windows calls this callback before the QEMU keyboard hook procedure. This disables the special left control key filter when the keyboard is grabbed. To fix the problem, disable the SDL2 Windows low level keyboard hook procedure. Reported-by: Bernhard Beschow <shentey@gmail.com> Signed-off-by: Volker Rümelin <vr_qemu@t-online.de> Reviewed-by: Thomas Huth <thuth@redhat.com> Tested-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20230418062823.5683-1-vr_qemu@t-online.de> (cherry picked from commit 1dfea3f212e43bfd59d1e1f40b9776db440b211f) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-31ui/sdl2: Grab Alt+F4 also under WindowsBernhard Beschow
SDL doesn't grab Alt+F4 under Windows by default. Pressing Alt+F4 thus closes the VM immediately without confirmation, possibly leading to data loss. Fix this by always grabbing Alt+F4 on Windows hosts, too. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Volker Rümelin <vr_qemu@t-online.de> Message-Id: <20230417192139.43263-3-shentey@gmail.com> (cherry picked from commit 083db9db44c89d7ea7f81844302194d708bcff2b) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-31ui/sdl2: Grab Alt+Tab also in fullscreen modeBernhard Beschow
By default, SDL grabs Alt+Tab only in non-fullscreen mode. This causes Alt+Tab to switch tasks on the host rather than in the VM in fullscreen mode while it switches tasks in non-fullscreen mode in the VM. Fix this confusing behavior by grabbing Alt+Tab in fullscreen mode, always causing tasks to be switched in the VM. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Volker Rümelin <vr_qemu@t-online.de> Message-Id: <20230417192139.43263-2-shentey@gmail.com> (cherry picked from commit efc00a37090eced53bff8b42d26991252aaacc44) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-31ui/sdl2: fix surface_gl_update_texture: Assertion 'gls' failedMarc-André Lureau
Before sdl2_gl_update() is called, sdl2_gl_switch() may decide to destroy the console window and its associated shaders. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1644 Fixes: c84ab0a500a8 ("ui/console: optionally update after gfx switch") Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Tested-by: Bin Meng <bin.meng@windriver.com> Message-Id: <20230511074217.4171842-1-marcandre.lureau@redhat.com> (cherry picked from commit b3a654d82ecf276b59a67b2fd688e11a0d8a0064) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-31ui/gtk-egl: fix scaling for cursor position in scanout modeErico Nunes
vc->gfx.w and vc->gfx.h are not updated appropriately in this code path, which leads to a different scaling factor for rendering the cursor on some edge cases (e.g. the focus has left and re-entered the gtk window). This can be reproduced using vhost-user-gpu with the gtk ui on the x11 backend. Use the surface dimensions which are already updated accordingly. Signed-off-by: Erico Nunes <ernunes@redhat.com> Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20230320160856.364319-2-ernunes@redhat.com> (cherry picked from commit f8a951bb951140a585341c700ebeec58d83f7bbc) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-31ui/gtk: use widget size for cursor motion eventErico Nunes
The gd_motion_event size has some calculations for the cursor position, which also take into account things like different size of the framebuffer compared to the window size. The use of window size makes things more difficult though, as at least in the case of Wayland includes the size of ui elements like a menu bar at the top of the window. This leads to a wrong position calculation by a few pixels. Fix it by using the size of the widget, which already returns the size of the actual space to render the framebuffer. Signed-off-by: Erico Nunes <ernunes@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Message-Id: <20230320160856.364319-1-ernunes@redhat.com> (cherry picked from commit 2f31663ed4b5631b5e1c79f5cdd6463e55410eb8) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-31ui/gtk: fix passing y0_top parameter to scanoutErico Nunes
The dmabuf->y0_top flag is passed to .dpy_gl_scanout_dmabuf(), however in the gtk ui both implementations dropped it when doing the next scanout_texture call. Fixes flipped linux console using vhost-user-gpu with the gtk ui display. Signed-off-by: Erico Nunes <ernunes@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20230220175605.43759-1-ernunes@redhat.com> (cherry picked from commit 94400fa53f81c9f58ad88cf3f3e7ea89ec423d39) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-31hw/ppc/prep: Fix wiring of PIC -> CPU interruptBernhard Beschow
Commit cef2e7148e32 ("hw/isa/i82378: Remove intermediate IRQ forwarder") passes s->cpu_intr to i8259_init() in i82378_realize() directly. However, s- >cpu_intr isn't initialized yet since that happens after the south bridge's pci_realize_and_unref() in board code. Fix this by initializing s->cpu_intr before realizing the south bridge. Fixes: cef2e7148e32 ("hw/isa/i82378: Remove intermediate IRQ forwarder") Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20230304114043.121024-4-shentey@gmail.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> (cherry picked from commit 2237af5e60ada06d90bf714e85523deafd936b9b) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-31scripts/device-crash-test: Add a parameter to run with TCG onlyThomas Huth
We're currently facing the problem that the device-crash-test script runs twice as long in the CI when a runner supports KVM - which sometimes results in a timeout of the CI job. To get a more deterministic runtime here, add an option to the script that allows to run it with TCG only. Reported-by: Eldon Stegall <eldon-qemu@eldondev.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230414145845.456145-3-thuth@redhat.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20230424092249.58552-6-alex.bennee@linaro.org> (cherry picked from commit 8b869aa59109d238fd684e1ade204b6942202120) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-31gitlab-ci: Avoid to re-run "configure" in the device-crash-test jobsThomas Huth
After "make check-venv" had been added to these jobs, they started to re-run "configure" each time since our logic in the makefile thinks that some files are out of date here. Avoid it with the same trick that we are using in buildtest-template.yml already by disabling the up-to-date check via NINJA=":". Fixes: 1d8cf47e5b ("tests: run 'device-crash-test' from tests/venv") Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230414145845.456145-2-thuth@redhat.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20230424092249.58552-5-alex.bennee@linaro.org> (cherry picked from commit 4d3bd91b26a69b39a178744d3d6e5f23050afb23) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-29Update version for 7.2.3 releasev7.2.3Michael Tokarev
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-28machine: do not crash if default RAM backend name has been stolenIgor Mammedov
QEMU aborts when default RAM backend should be used (i.e. no explicit '-machine memory-backend=' specified) but user has created an object which 'id' equals to default RAM backend name used by board. $QEMU -machine pc \ -object memory-backend-ram,id=pc.ram,size=4294967296 Actual results: QEMU 7.2.0 monitor - type 'help' for more information (qemu) Unexpected error in object_property_try_add() at ../qom/object.c:1239: qemu-kvm: attempt to add duplicate property 'pc.ram' to object (type 'container') Aborted (core dumped) Instead of abort, check for the conflicting 'id' and exit with an error, suggesting how to remedy the issue. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2207886 Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230522131717.3780533-1-imammedo@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com> (cherry picked from commit a37531f2381c4e294e48b1417089474128388b44) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-28hw/scsi/lsi53c895a: Fix reentrancy issues in the LSI controller (CVE-2023-0330)Thomas Huth
We cannot use the generic reentrancy guard in the LSI code, so we have to manually prevent endless reentrancy here. The problematic lsi_execute_script() function has already a way to detect whether too many instructions have been executed - we just have to slightly change the logic here that it also takes into account if the function has been called too often in a reentrant way. The code in fuzz-lsi53c895a-test.c has been taken from an earlier patch by Mauro Matteo Cascella. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1563 Message-Id: <20230522091011.1082574-1-thuth@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Thomas Huth <thuth@redhat.com> (cherry picked from commit b987718bbb1d0eabf95499b976212dd5f0120d75) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-28usb/ohci: Set pad to 0 after frame updatePaolo Bonzini
When the OHCI controller's framenumber is incremented, HccaPad1 register should be set to zero (Ref OHCI Spec 4.4) ReactOS uses hccaPad1 to determine if the OHCI hardware is running, consequently it fails this check in current qemu master. Signed-off-by: Ryan Wendland <wendland@live.com.au> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1048 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 6301460ce9f59885e8feb65185bcfb6b128c8eff) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-28util/vfio-helpers: Use g_file_read_link()Akihiko Odaki
When _FORTIFY_SOURCE=2, glibc version is 2.35, and GCC version is 12.1.0, the compiler complains as follows: In file included from /usr/include/features.h:490, from /usr/include/bits/libc-header-start.h:33, from /usr/include/stdint.h:26, from /usr/lib/gcc/aarch64-unknown-linux-gnu/12.1.0/include/stdint.h:9, from /home/alarm/q/var/qemu/include/qemu/osdep.h:94, from ../util/vfio-helpers.c:13: In function 'readlink', inlined from 'sysfs_find_group_file' at ../util/vfio-helpers.c:116:9, inlined from 'qemu_vfio_init_pci' at ../util/vfio-helpers.c:326:18, inlined from 'qemu_vfio_open_pci' at ../util/vfio-helpers.c:517:9: /usr/include/bits/unistd.h:119:10: error: argument 2 is null but the corresponding size argument 3 value is 4095 [-Werror=nonnull] 119 | return __glibc_fortify (readlink, __len, sizeof (char), | ^~~~~~~~~~~~~~~ This error implies the allocated buffer can be NULL. Use g_file_read_link(), which allocates buffer automatically to avoid the error. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> (cherry picked from commit dbdea0dbfe2cef9ef6c752e9077e4fc98724194c) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-28rtl8139: fix large_send_mss divide-by-zeroStefan Hajnoczi
If the driver sets large_send_mss to 0 then a divide-by-zero occurs. Even if the division wasn't a problem, the for loop that emits MSS-sized packets would never terminate. Solve these issues by skipping offloading when large_send_mss=0. This issue was found by OSS-Fuzz as part of Alexander Bulekov's device fuzzing work. The reproducer is: $ cat << EOF | ./qemu-system-i386 -display none -machine accel=qtest, -m \ 512M,slots=1,maxmem=0xffff000000000000 -machine q35 -nodefaults -device \ rtl8139,netdev=net0 -netdev user,id=net0 -device \ pc-dimm,id=nv1,memdev=mem1,addr=0xb800a64602800000 -object \ memory-backend-ram,id=mem1,size=2M -qtest stdio outl 0xcf8 0x80000814 outl 0xcfc 0xe0000000 outl 0xcf8 0x80000804 outw 0xcfc 0x06 write 0xe0000037 0x1 0x04 write 0xe00000e0 0x2 0x01 write 0x1 0x1 0x04 write 0x3 0x1 0x98 write 0xa 0x1 0x8c write 0xb 0x1 0x02 write 0xc 0x1 0x46 write 0xd 0x1 0xa6 write 0xf 0x1 0xb8 write 0xb800a646028c000c 0x1 0x08 write 0xb800a646028c000e 0x1 0x47 write 0xb800a646028c0010 0x1 0x02 write 0xb800a646028c0017 0x1 0x06 write 0xb800a646028c0036 0x1 0x80 write 0xe00000d9 0x1 0x40 EOF Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1582 Closes: https://gitlab.com/qemu-project/qemu/-/issues/1582 Cc: qemu-stable@nongnu.org Cc: Peter Maydell <peter.maydell@linaro.org> Fixes: 6d71357a3b65 ("rtl8139: honor large send MSS value") Reported-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> (cherry picked from commit 792676c165159c11412346870fd58fd243ab2166) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-23e1000e: Fix tx/rx counterstimothee.cocault@gmail.com
The bytes and packets counter registers are cleared on read. Copying the "total counter" registers to the "good counter" registers has side effects. If the "total" register is never read by the OS, it only gets incremented. This leads to exponential growth of the "good" register. This commit increments the counters individually to avoid this. Signed-off-by: Timothée Cocault <timothee.cocault@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> (cherry picked from commit 8d689f6aae8be096b4a1859be07c1b083865f755) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: removed hw/net/igb_core.c part: igb introduced in 8.0)
2023-05-23e1000: Count CRC in Tx statisticsAkihiko Odaki
The Software Developer's Manual 13.7.4.5 "Packets Transmitted (64 Bytes) Count" says: > This register counts the number of packets transmitted that are > exactly 64 bytes (from <Destination Address> through <CRC>, > inclusively) in length. It also says similar for the other Tx statistics registers. Add the number of bytes for CRC to those registers. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com> (cherry picked from commit c50b152485d4e10dfa1e1d7ea668f29a5fb92e9c) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: pick this for 7.2 too: a fix by its own and makes next patch to apply cleanly)
2023-05-22virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_requestMauro Matteo Cascella
Ensure op_info is not NULL in case of QCRYPTODEV_BACKEND_ALG_SYM algtype. Fixes: 0e660a6f90a ("crypto: Introduce RSA algorithm") Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com> Reported-by: Yiming Tao <taoym@zju.edu.cn> Message-Id: <20230509075317.1132301-1-mcascell@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: zhenwei pi<pizhenwei@bytedance.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit 3e69908907f8d3dd20d5753b0777a6e3824ba824) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: context tweak after 999c789f00 cryptodev: Introduce cryptodev alg type in QAPI)
2023-05-22virtio-net: not enable vq reset feature unconditionallyEugenio Pérez
The commit 93a97dc5200a ("virtio-net: enable vq reset feature") enables unconditionally vq reset feature as long as the device is emulated. This makes impossible to actually disable the feature, and it causes migration problems from qemu version previous than 7.2. The entire final commit is unneeded as device system already enable or disable the feature properly. This reverts commit 93a97dc5200a95e63b99cb625f20b7ae802ba413. Fixes: 93a97dc5200a ("virtio-net: enable vq reset feature") Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230504101447.389398-1-eperezma@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit 1fac00f70b3261050af5564b20ca55c1b2a3059a) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-19vhost: fix possible wrap in SVQ descriptor ringHawkins Jiawei
QEMU invokes vhost_svq_add() when adding a guest's element into SVQ. In vhost_svq_add(), it uses vhost_svq_available_slots() to check whether QEMU can add the element into SVQ. If there is enough space, then QEMU combines some out descriptors and some in descriptors into one descriptor chain, and adds it into `svq->vring.desc` by vhost_svq_vring_write_descs(). Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx` in vhost_svq_available_slots() returns the number of occupied elements, or the number of descriptor chains, instead of the number of occupied descriptors, which may cause wrapping in SVQ descriptor ring. Here is an example. In vhost_handle_guest_kick(), QEMU forwards as many available buffers to device by virtqueue_pop() and vhost_svq_add_element(). virtqueue_pop() returns a guest's element, and then this element is added into SVQ by vhost_svq_add_element(), a wrapper to vhost_svq_add(). If QEMU invokes virtqueue_pop() and vhost_svq_add_element() `svq->vring.num` times, vhost_svq_available_slots() thinks QEMU just ran out of slots and everything should work fine. But in fact, virtqueue_pop() returns `svq->vring.num` elements or descriptor chains, more than `svq->vring.num` descriptors due to guest memory fragmentation, and this causes wrapping in SVQ descriptor ring. This bug is valid even before marking the descriptors used. If the guest memory is fragmented, SVQ must add chains so it can try to add more descriptors than possible. This patch solves it by adding `num_free` field in VhostShadowVirtqueue structure and updating this field in vhost_svq_add() and vhost_svq_get_buf(), to record the number of free descriptors. Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding") Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230509084817.3973-1-yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> (cherry picked from commit 5d410557dea452f6231a7c66155e29a37e168528) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18target/i386: fix avx2 instructions vzeroall and vpermdqXinyu Li
vzeroall: xmm_regs should be used instead of xmm_t0 vpermdq: bit 3 and 7 of imm should be considered Signed-off-by: Xinyu Li <lixinyu20s@ict.ac.cn> Message-Id: <20230510145222.586487-1-lixinyu20s@ict.ac.cn> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 056d649007bc9fdae9f1d576e77c1316e9a34468) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18target/i386: fix operand size for VCOMI/VUCOMI instructionsPaolo Bonzini
Compared to other SSE instructions, VUCOMISx and VCOMISx are different: the single and double precision versions are distinguished through a prefix, however they use no-prefix and 0x66 for SS and SD respectively. Scalar values usually are associated with 0xF2 and 0xF3. Because of these, they incorrectly perform a 128-bit memory load instead of a 32- or 64-bit load. Fix this by writing a custom decoding function. I tested that the reproducer is fixed and the test-avx output does not change. Reported-by: Gabriele Svelto <gsvelto@mozilla.com> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1637 Fixes: f8d19eec0d53 ("target/i386: reimplement 0x0f 0x28-0x2f, add AVX", 2022-10-18) Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 2b55e479e6fcbb466585fd25077a50c32e10dc3a) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18scsi-generic: fix buffer overflow on block limits inquiryPaolo Bonzini
Using linux 6.x guest, at boot time, an inquiry on a scsi-generic device makes qemu crash. This is caused by a buffer overflow when scsi-generic patches the block limits VPD page. Do the operations on a temporary on-stack buffer that is guaranteed to be large enough. Reported-by: Théo Maillart <tmaillart@freebox.fr> Analyzed-by: Théo Maillart <tmaillart@freebox.fr> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 9bd634b2f5e2f10fe35d7609eb83f30583f2e15a) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18target/arm: Fix vd == vm overlap in sve_ldff1_zRichard Henderson
If vd == vm, copy vm to scratch, so that we can pre-zero the output and still access the gather indicies. Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1612 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230504104232.1877774-1-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> (cherry picked from commit a6771f2f5cbfbf312e2fb5b1627f38a6bf6321d0) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18migration: Attempt disk reactivation in more failure scenariosEric Blake
Commit fe904ea824 added a fail_inactivate label, which tries to reactivate disks on the source after a failure while s->state == MIGRATION_STATUS_ACTIVE, but didn't actually use the label if qemu_savevm_state_complete_precopy() failed. This failure to reactivate is also present in commit 6039dd5b1c (also covering the new s->state == MIGRATION_STATUS_DEVICE state) and 403d18ae (ensuring s->block_inactive is set more reliably). Consolidate the two labels back into one - no matter HOW migration is failed, if there is any chance we can reach vm_start() after having attempted inactivation, it is essential that we have tried to restart disks before then. This also makes the cleanup more like migrate_fd_cancel(). Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20230502205212.134680-1-eblake@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit 6dab4c93ecfae48e2e67b984d1032c1e988d3005) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: minor context tweak near added comment in migration/migration.c)
2023-05-18migration: Minor control flow simplificationEric Blake
No need to declare a temporary variable. Suggested-by: Juan Quintela <quintela@redhat.com> Fixes: 1df36e8c6289 ("migration: Handle block device inactivation failures better") Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> (cherry picked from commit 5d39f44d7ac5c63f53d4d0900ceba9521bc27e49) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18migration: Handle block device inactivation failures betterEric Blake
Consider what happens when performing a migration between two host machines connected to an NFS server serving multiple block devices to the guest, when the NFS server becomes unavailable. The migration attempts to inactivate all block devices on the source (a necessary step before the destination can take over); but if the NFS server is non-responsive, the attempt to inactivate can itself fail. When that happens, the destination fails to get the migrated guest (good, because the source wasn't able to flush everything properly): (qemu) qemu-kvm: load of migration failed: Input/output error at which point, our only hope for the guest is for the source to take back control. With the current code base, the host outputs a message, but then appears to resume: (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1) (src qemu)info status VM status: running but a second migration attempt now asserts: (src qemu) qemu-kvm: ../block.c:6738: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed. Whether the guest is recoverable on the source after the first failure is debatable, but what we do not want is to have qemu itself fail due to an assertion. It looks like the problem is as follows: In migration.c:migration_completion(), the source sets 'inactivate' to true (since COLO is not enabled), then tries savevm.c:qemu_savevm_state_complete_precopy() with a request to inactivate block devices. In turn, this calls block.c:bdrv_inactivate_all(), which fails when flushing runs up against the non-responsive NFS server. With savevm failing, we are now left in a state where some, but not all, of the block devices have been inactivated; but migration_completion() then jumps to 'fail' rather than 'fail_invalidate' and skips an attempt to reclaim those those disks by calling bdrv_activate_all(). Even if we do attempt to reclaim disks, we aren't taking note of failure there, either. Thus, we have reached a state where the migration engine has forgotten all state about whether a block device is inactive, because we did not set s->block_inactive in enough places; so migration allows the source to reach vm_start() and resume execution, violating the block layer invariant that the guest CPUs should not be restarted while a device is inactive. Note that the code in migration.c:migrate_fd_cancel() will also try to reactivate all block devices if s->block_inactive was set, but because we failed to set that flag after the first failure, the source assumes it has reclaimed all devices, even though it still has remaining inactivated devices and does not try again. Normally, qmp_cont() will also try to reactivate all disks (or correctly fail if the disks are not reclaimable because NFS is not yet back up), but the auto-resumption of the source after a migration failure does not go through qmp_cont(). And because we have left the block layer in an inconsistent state with devices still inactivated, the later migration attempt is hitting the assertion failure. Since it is important to not resume the source with inactive disks, this patch marks s->block_inactive before attempting inactivation, rather than after succeeding, in order to prevent any vm_start() until it has successfully reactivated all devices. See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982 Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Acked-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Juan Quintela <quintela@redhat.com> (cherry picked from commit 403d18ae384239876764bbfa111d6cc5dcb673d1) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18linux-user: fix getgroups/setgroups allocationsMichael Tokarev
linux-user getgroups(), setgroups(), getgroups32() and setgroups32() used alloca() to allocate grouplist arrays, with unchecked gidsetsize coming from the "guest". With NGROUPS_MAX being 65536 (linux, and it is common for an application to allocate NGROUPS_MAX for getgroups()), this means a typical allocation is half the megabyte on the stack. Which just overflows stack, which leads to immediate SIGSEGV in actual system getgroups() implementation. An example of such issue is aptitude, eg https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=811087#72 Cap gidsetsize to NGROUPS_MAX (return EINVAL if it is larger than that), and use heap allocation for grouplist instead of alloca(). While at it, fix coding style and make all 4 implementations identical. Try to not impose random limits - for example, allow gidsetsize to be negative for getgroups() - just do not allocate negative-sized grouplist in this case but still do actual getgroups() call. But do not allow negative gidsetsize for setgroups() since its argument is unsigned. Capping by NGROUPS_MAX seems a bit arbitrary, - we can do more, it is not an error if set size will be NGROUPS_MAX+1. But we should not allow integer overflow for the array being allocated. Maybe it is enough to just call g_try_new() and return ENOMEM if it fails. Maybe there's also no need to convert setgroups() since this one is usually smaller and known beforehand (KERN_NGROUPS_MAX is actually 63, - this is apparently a kernel-imposed limit for runtime group set). The patch fixes aptitude segfault mentioned above. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Message-Id: <20230409105327.1273372-1-mjt@msgid.tls.msk.ru> Signed-off-by: Laurent Vivier <laurent@vivier.eu> (cherry picked from commit 1e35d327890bdd117a67f79c52e637fb12bb1bf4) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18linux-user: Fix mips fp64 executables loadingDaniil Kovalev
If a program requires fr1, we should set the FR bit of CP0 control status register and add F64 hardware flag. The corresponding `else if` branch statement is copied from the linux kernel sources (see `arch_check_elf` function in linux/arch/mips/kernel/elf.c). Signed-off-by: Daniil Kovalev <dkovalev@compiler-toolchain-for.me> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <20230404052153.16617-1-dkovalev@compiler-toolchain-for.me> Signed-off-by: Laurent Vivier <laurent@vivier.eu> (cherry picked from commit a0f8d2701b205d9d7986aa555e0566b13dc18fa0) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18tests/docker: bump the xtensa base to debian:11-slimAlex Bennée
Stretch is going out of support so things like security updates will fail. As the toolchain itself is binary it hopefully won't mind the underlying OS being updated. Message-Id: <20230503091244.1450613-3-alex.bennee@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reported-by: Richard Henderson <richard.henderson@linaro.org> (cherry picked from commit 3217b84f3cd813a7daffc64b26543c313f3a042a) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18target/ppc: Fix helper_pminsn() prototypeCédric Le Goater
GCC13 reports an error: ../target/ppc/excp_helper.c:2625:6: error: conflicting types for ‘helper_pminsn’ due to enum/integer mismatch; have ‘void(CPUPPCState *, powerpc_pm_insn_t)’ {aka ‘void(struct CPUArchState *, powerpc_pm_insn_t)’} [-Werror=enum-int-mismatch] 2625 | void helper_pminsn(CPUPPCState *env, powerpc_pm_insn_t insn) | ^~~~~~~~~~~~~ In file included from /home/legoater/work/qemu/qemu.git/include/qemu/osdep.h:49, from ../target/ppc/excp_helper.c:19: /home/legoater/work/qemu/qemu.git/include/exec/helper-head.h:23:27: note: previous declaration of ‘helper_pminsn’ with type ‘void(CPUArchState *, uint32_t)’ {aka ‘void(CPUArchState *, unsigned int)’} 23 | #define HELPER(name) glue(helper_, name) | ^~~~~~~ Fixes: 7778a575c7 ("ppc: Add P7/P8 Power Management instructions") Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20230321161609.716474-4-clg@kaod.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> (cherry picked from commit 07e4804fcde1559aaa335fd680487ba308d86fb3) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18Revert "vhost-user: Introduce nested event loop in vhost_user_read()"Greg Kurz
This reverts commit a7f523c7d114d445c5d83aecdba3efc038e5a692. The nested event loop is broken by design. It's only user was removed. Drop the code as well so that nobody ever tries to use it again. I had to fix a couple of trivial conflicts around return values because of 025faa872bcf ("vhost-user: stick to -errno error return convention"). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20230119172424.478268-3-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> (cherry picked from commit 4382138f642f69fdbc79ebf4e93d84be8061191f) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18Revert "vhost-user: Monitor slave channel in vhost_user_read()"Greg Kurz
This reverts commit db8a3772e300c1a656331a92da0785d81667dc81. Motivation : this is breaking vhost-user with DPDK as reported in [0]. Received unexpected msg type. Expected 22 received 40 Fail to update device iotlb Received unexpected msg type. Expected 40 received 22 Received unexpected msg type. Expected 22 received 11 Fail to update device iotlb Received unexpected msg type. Expected 11 received 22 vhost VQ 1 ring restore failed: -71: Protocol error (71) Received unexpected msg type. Expected 22 received 11 Fail to update device iotlb Received unexpected msg type. Expected 11 received 22 vhost VQ 0 ring restore failed: -71: Protocol error (71) unable to start vhost net: 71: falling back on userspace virtio The failing sequence that leads to the first error is : - QEMU sends a VHOST_USER_GET_STATUS (40) request to DPDK on the master socket - QEMU starts a nested event loop in order to wait for the VHOST_USER_GET_STATUS response and to be able to process messages from the slave channel - DPDK sends a couple of legitimate IOTLB miss messages on the slave channel - QEMU processes each IOTLB request and sends VHOST_USER_IOTLB_MSG (22) updates on the master socket - QEMU assumes to receive a response for the latest VHOST_USER_IOTLB_MSG but it gets the response for the VHOST_USER_GET_STATUS instead The subsequent errors have the same root cause : the nested event loop breaks the order by design. It lures QEMU to expect responses to the latest message sent on the master socket to arrive first. Since this was only needed for DAX enablement which is still not merged upstream, just drop the code for now. A working solution will have to be merged later on. Likely protect the master socket with a mutex and service the slave channel with a separate thread, as discussed with Maxime in the mail thread below. [0] https://lore.kernel.org/qemu-devel/43145ede-89dc-280e-b953-6a2b436de395@redhat.com/ Reported-by: Yanghang Liu <yanghliu@redhat.com> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2155173 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20230119172424.478268-2-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> (cherry picked from commit f340a59d5a852d75ae34555723694c7e8eafbd0c) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18xen/pt: reserve PCI slot 2 for Intel igd-passthruChuck Zmudzinski
Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus, as noted in docs/igd-assign.txt in the Qemu source code. Currently, when the xl toolstack is used to configure a Xen HVM guest with Intel IGD passthrough to the guest with the Qemu upstream device model, a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy a different slot. This problem often prevents the guest from booting. The only available workarounds are not good: Configure Xen HVM guests to use the old and no longer maintained Qemu traditional device model available from xenbits.xen.org which does reserve slot 2 for the Intel IGD or use the "pc" machine type instead of the "xenfv" machine type and add the xen platform device at slot 3 using a command line option instead of patching qemu to fix the "xenfv" machine type directly. The second workaround causes some degredation in startup performance such as a longer boot time and reduced resolution of the grub menu that is displayed on the monitor. This patch avoids that reduced startup performance when using the Qemu upstream device model for Xen HVM guests configured with the igd-passthru=on option. To implement this feature in the Qemu upstream device model for Xen HVM guests, introduce the following new functions, types, and macros: * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS * typedef XenPTQdevRealize function pointer * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2 * xen_igd_reserve_slot and xen_igd_clear_slot functions Michael Tsirkin: * Introduce XEN_PCI_IGD_DOMAIN, XEN_PCI_IGD_BUS, XEN_PCI_IGD_DEV, and XEN_PCI_IGD_FN - use them to compute the value of XEN_PCI_IGD_SLOT_MASK The new xen_igd_reserve_slot function uses the existing slot_reserved_mask member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using the xl toolstack with the gfx_passthru option enabled, which sets the igd-passthru=on option to Qemu for the Xen HVM machine type. The new xen_igd_reserve_slot function also needs to be implemented in hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough, in which case it does nothing. The new xen_igd_clear_slot function overrides qdev->realize of the parent PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was created in hw/i386/pc_piix.c for the case when igd-passthru=on. Move the call to xen_host_pci_device_get, and the associated error handling, from xen_pt_realize to the new xen_igd_clear_slot function to initialize the device class and vendor values which enables the checks for the Intel IGD to succeed. The verification that the host device is an Intel IGD to be passed through is done by checking the domain, bus, slot, and function values as well as by checking that gfx_passthru is enabled, the device class is VGA, and the device vendor in Intel. Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Message-Id: <b1b4a21fe9a600b1322742dda55a40e9961daa57.1674346505.git.brchuckz@aol.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> (cherry picked from commit 4f67543bb8c5b031c2ad3785c1a2f3c255d72b25) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-189pfs/xen: Fix segfault on shutdownJason Andryuk
xen_9pfs_free can't use gnttabdev since it is already closed and NULL-ed out when free is called. Do the teardown in _disconnect(). This matches the setup done in _connect(). trace-events are also added for the XenDevOps functions. Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Message-Id: <20230502143722.15613-1-jandryuk@gmail.com> [C.S.: - Remove redundant return in xen_9pfs_free(). - Add comment to trace-events. ] Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> (cherry picked from commit 92e667f6fd5806a6a705a2a43e572bd9ec6819da) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: minor context conflict in hw/9pfs/xen-9p-backend.c)
2023-05-18s390x/tcg: Fix LDER instruction formatIlya Leoshkevich
It's RRE, not RXE. Found by running valgrind's none/tests/s390x/bfp-2. Fixes: 86b59624c4aa ("s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP") Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-Id: <20230511134726.469651-1-iii@linux.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com> (cherry picked from commit 970641de01908dd09b569965e78f13842e5854bc) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: context tweak)