aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-09-12softmmu/physmem: fix memory leak in dirty_memory_extend()David Hildenbrand
As reported by Peter, we might be leaking memory when removing the highest RAMBlock (in the weird ram_addr_t space), and adding a new one. We will fail to realize that we already allocated bitmaps for more dirty memory blocks, and effectively discard the pointers to them. Fix it by getting rid of last_ram_page() and by remembering the number of dirty memory blocks that have been allocated already. While at it, let's use "unsigned int" for the number of blocks, which should be sufficient until we reach ~32 exabytes. Looks like this leak was introduced as we switched from using a single bitmap_zero_extend() to allocating multiple bitmaps: bitmap_zero_extend() relies on g_renew() which should have taken care of this. Resolves: https://lkml.kernel.org/r/CAFEAcA-k7a+VObGAfCFNygQNfCKL=AfX6A4kScq=VSSK0peqPg@mail.gmail.com Reported-by: Peter Maydell <peter.maydell@linaro.org> Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug") Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Maydell <peter.maydell@linaro.org> Cc: qemu-stable@nongnu.org Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20240828090743.128647-1-david@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> (cherry picked from commit b84f06c2bee727b3870b4eeccbe3a45c5aea14c1) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: context fix due to lack of v9.0.0-rc4-49-g15f7a80c49cb "RAMBlock: Add support of KVM private guest memfd")
2024-08-28Revert "replay: stop us hanging in rr_wait_io_event"Nicholas Piggin
This reverts commit 1f881ea4a444ef36a8b6907b0b82be4b3af253a2. That commit causes reverse_debugging.py test failures, and does not seem to solve the root cause of the problem x86-64 still hangs in record/replay tests. The problem with short-cutting the iowait that was taken during record phase is that related events will not get consumed at the same points (e.g., reading the clock). A hang with zero icount always seems to be a symptom of an earlier problem that has caused the recording to become out of synch with the execution and consumption of events by replay. Acked-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20240813050638.446172-6-npiggin@gmail.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20240813202329.1237572-14-alex.bennee@linaro.org> (cherry picked from commit 94962ff00d09674047aed896e87ba09736cd6941) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2524
2024-08-28nbd/server: CVE-2024-7409: Cap default max-connections to 100Eric Blake
Allowing an unlimited number of clients to any web service is a recipe for a rudimentary denial of service attack: the client merely needs to open lots of sockets without closing them, until qemu no longer has any more fds available to allocate. For qemu-nbd, we default to allowing only 1 connection unless more are explicitly asked for (-e or --shared); this was historically picked as a nice default (without an explicit -t, a non-persistent qemu-nbd goes away after a client disconnects, without needing any additional follow-up commands), and we are not going to change that interface now (besides, someday we want to point people towards qemu-storage-daemon instead of qemu-nbd). But for qemu proper, and the newer qemu-storage-daemon, the QMP nbd-server-start command has historically had a default of unlimited number of connections, in part because unlike qemu-nbd it is inherently persistent until nbd-server-stop. Allowing multiple client sockets is particularly useful for clients that can take advantage of MULTI_CONN (creating parallel sockets to increase throughput), although known clients that do so (such as libnbd's nbdcopy) typically use only 8 or 16 connections (the benefits of scaling diminish once more sockets are competing for kernel attention). Picking a number large enough for typical use cases, but not unlimited, makes it slightly harder for a malicious client to perform a denial of service merely by opening lots of connections withot progressing through the handshake. This change does not eliminate CVE-2024-7409 on its own, but reduces the chance for fd exhaustion or unlimited memory usage as an attack surface. On the other hand, by itself, it makes it more obvious that with a finite limit, we have the problem of an unauthenticated client holding 100 fds opened as a way to block out a legitimate client from being able to connect; thus, later patches will further add timeouts to reject clients that are not making progress. This is an INTENTIONAL change in behavior, and will break any client of nbd-server-start that was not passing an explicit max-connections parameter, yet expects more than 100 simultaneous connections. We are not aware of any such client (as stated above, most clients aware of MULTI_CONN get by just fine on 8 or 16 connections, and probably cope with later connections failing by relying on the earlier connections; libvirt has not yet been passing max-connections, but generally creates NBD servers with the intent for a single client for the sake of live storage migration; meanwhile, the KubeSAN project anticipates a large cluster sharing multiple clients [up to 8 per node, and up to 100 nodes in a cluster], but it currently uses qemu-nbd with an explicit --shared=0 rather than qemu-storage-daemon with nbd-server-start). We considered using a deprecation period (declare that omitting max-parameters is deprecated, and make it mandatory in 3 releases - then we don't need to pick an arbitrary default); that has zero risk of breaking any apps that accidentally depended on more than 100 connections, and where such breakage might not be noticed under unit testing but only under the larger loads of production usage. But it does not close the denial-of-service hole until far into the future, and requires all apps to change to add the parameter even if 100 was good enough. It also has a drawback that any app (like libvirt) that is accidentally relying on an unlimited default should seriously consider their own CVE now, at which point they are going to change to pass explicit max-connections sooner than waiting for 3 qemu releases. Finally, if our changed default breaks an app, that app can always pass in an explicit max-parameters with a larger value. It is also intentional that the HMP interface to nbd-server-start is not changed to expose max-connections (any client needing to fine-tune things should be using QMP). Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20240807174943.771624-12-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [ericb: Expand commit message to summarize Dan's argument for why we break corner-case back-compat behavior without a deprecation period] Signed-off-by: Eric Blake <eblake@redhat.com> (cherry picked from commit c8a76dbd90c2f48df89b75bef74917f90a59b623) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-08-28nbd/server: Plumb in new args to nbd_client_add()Eric Blake
Upcoming patches to fix a CVE need to track an opaque pointer passed in by the owner of a client object, as well as request for a time limit on how fast negotiation must complete. Prepare for that by changing the signature of nbd_client_new() and adding an accessor to get at the opaque pointer, although for now the two servers (qemu-nbd.c and blockdev-nbd.c) do not change behavior even though they pass in a new default timeout value. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20240807174943.771624-11-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [eblake: s/LIMIT/MAX_SECS/ as suggested by Dan] Signed-off-by: Eric Blake <eblake@redhat.com> (cherry picked from commit fb1c2aaa981e0a2fa6362c9985f1296b74f055ac) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-08-28virtio-net: Fix network stall at the host side waiting for kickthomas
Patch 06b12970174 ("virtio-net: fix network stall under load") added double-check to test whether the available buffer size can satisfy the request or not, in case the guest has added some buffers to the avail ring simultaneously after the first check. It will be lucky if the available buffer size becomes okay after the double-check, then the host can send the packet to the guest. If the buffer size still can't satisfy the request, even if the guest has added some buffers, viritio-net would stall at the host side forever. The patch enables notification and checks whether the guest has added some buffers since last check of available buffers when the available buffers are insufficient. If no buffer is added, return false, else recheck the available buffers in the loop. If the available buffers are sufficient, disable notification and return true. Changes: 1. Change the return type of virtqueue_get_avail_bytes() from void to int, it returns an opaque that represents the shadow_avail_idx of the virtqueue on success, else -1 on error. 2. Add a new API: virtio_queue_enable_notification_and_check(), it takes an opaque as input arg which is returned from virtqueue_get_avail_bytes(). It enables notification firstly, then checks whether the guest has added some buffers since last check of available buffers or not by virtio_queue_poll(), return ture if yes. The patch also reverts patch "06b12970174". The case below can reproduce the stall. Guest 0 +--------+ | iperf | ---------------> | server | Host | +--------+ +--------+ | ... | iperf |---- | client |---- Guest n +--------+ | +--------+ | | iperf | ---------------> | server | +--------+ Boot many guests from qemu with virtio network: qemu ... -netdev tap,id=net_x \ -device virtio-net-pci-non-transitional,\ iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x Each guest acts as iperf server with commands below: iperf3 -s -D -i 10 -p 8001 iperf3 -s -D -i 10 -p 8002 The host as iperf client: iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000 iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000 After some time, the host loses connection to the guest, the guest can send packet to the host, but can't receive packet from the host. It's more likely to happen if SWIOTLB is enabled in the guest, allocating and freeing bounce buffer takes some CPU ticks, copying from/to bounce buffer takes more CPU ticks, compared with that there is no bounce buffer in the guest. Once the rate of producing packets from the host approximates the rate of receiveing packets in the guest, the guest would loop in NAPI. receive packets --- | | v | free buf virtnet_poll | | v | add buf to avail ring --- | | need kick the host? | NAPI continues v receive packets --- | | v | free buf virtnet_poll | | v | add buf to avail ring --- | v ... ... On the other hand, the host fetches free buf from avail ring, if the buf in the avail ring is not enough, the host notifies the guest the event by writing the avail idx read from avail ring to the event idx of used ring, then the host goes to sleep, waiting for the kick signal from the guest. Once the guest finds the host is waiting for kick singal (in virtqueue_kick_prepare_split()), it kicks the host. The host may stall forever at the sequences below: Host Guest ------------ ----------- fetch buf, send packet receive packet --- ... ... | fetch buf, send packet add buf | ... add buf virtnet_poll buf not enough avail idx-> add buf | read avail idx add buf | add buf --- receive packet --- write event idx ... | wait for kick add buf virtnet_poll ... | --- no more packet, exit NAPI In the first loop of NAPI above, indicated in the range of virtnet_poll above, the host is sending packets while the guest is receiving packets and adding buffers. step 1: The buf is not enough, for example, a big packet needs 5 buf, but the available buf count is 3. The host read current avail idx. step 2: The guest adds some buf, then checks whether the host is waiting for kick signal, not at this time. The used ring is not empty, the guest continues the second loop of NAPI. step 3: The host writes the avail idx read from avail ring to used ring as event idx via virtio_queue_set_notification(q->rx_vq, 1). step 4: At the end of the second loop of NAPI, recheck whether kick is needed, as the event idx in the used ring written by the host is beyound the range of kick condition, the guest will not send kick signal to the host. Fixes: 06b12970174 ("virtio-net: fix network stall under load") Cc: qemu-stable@nongnu.org Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> (cherry picked from commit f937309fbdbb48c354220a3e7110c202ae4aa7fa) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: context fixup in include/hw/virtio/virtio.h)
2024-08-28hw/virtio: Fix the de-initialization of vhost-user devicesThomas Huth
The unrealize functions of the various vhost-user devices are calling the corresponding vhost_*_set_status() functions with a status of 0 to shut down the device correctly. Now these vhost_*_set_status() functions all follow this scheme: bool should_start = virtio_device_should_start(vdev, status); if (vhost_dev_is_started(&vvc->vhost_dev) == should_start) { return; } if (should_start) { /* ... do the initialization stuff ... */ } else { /* ... do the cleanup stuff ... */ } The problem here is virtio_device_should_start(vdev, 0) currently always returns "true" since it internally only looks at vdev->started instead of looking at the "status" parameter. Thus once the device got started once, virtio_device_should_start() always returns true and thus the vhost_*_set_status() functions return early, without ever doing any clean-up when being called with status == 0. This causes e.g. problems when trying to hot-plug and hot-unplug a vhost user devices multiple times since the de-initialization step is completely skipped during the unplug operation. This bug has been introduced in commit 9f6bcfd99f ("hw/virtio: move vm_running check to virtio_device_started") which replaced should_start = status & VIRTIO_CONFIG_S_DRIVER_OK; with should_start = virtio_device_started(vdev, status); which later got replaced by virtio_device_should_start(). This blocked the possibility to set should_start to false in case the status flag VIRTIO_CONFIG_S_DRIVER_OK was not set. Fix it by adjusting the virtio_device_should_start() function to only consider the status flag instead of vdev->started. Since this function is only used in the various vhost_*_set_status() functions for exactly the same purpose, it should be fine to fix it in this central place there without any risk to change the behavior of other code. Fixes: 9f6bcfd99f ("hw/virtio: move vm_running check to virtio_device_started") Buglink: https://issues.redhat.com/browse/RHEL-40708 Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20240618121958.88673-1-thuth@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit d72479b11797c28893e1e3fc565497a9cae5ca16) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-06-06virtio-gpu: fix v2 migrationMarc-André Lureau
Commit dfcf74fa ("virtio-gpu: fix scanout migration post-load") broke forward/backward version migration. Versioning of nested VMSD structures is not straightforward, as the wire format doesn't have nested structures versions. Introduce x-scanout-vmstate-version and a field test to save/load appropriately according to the machine version. Fixes: dfcf74fa ("virtio-gpu: fix scanout migration post-load") Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> [fixed long lines] Signed-off-by: Fabiano Rosas <farosas@suse.de> (cherry picked from commit 40a23ef643664b5c1021a9789f9d680b6294fb50) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-04-10Merge tag 'hw-misc-20240410' of https://github.com/philmd/qemu into stagingPeter Maydell
Misc HW patch queue - Fix CXL Fixed Memory Window interleave-granularity typo - Fix for DMA re-entrancy abuse with VirtIO devices (CVE-2024-3446) - Fix out-of-bound access in NAND block buffer - Fix memory leak in AppleSMC reset() handler - Avoid VirtIO crypto backends abort o invalid session ID - Fix overflow in LAN9118 MIL TX FIFO - Fix overflow when abusing SDHCI TRNMOD register (CVE-2024-3447) - Fix overrun in short fragmented packet SCTP checksum (CVE-2024-3567) - Remove unused assignment in virtio-snd model (Coverity 1542933 & 1542934) # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmYWV94ACgkQ4+MsLN6t # wN4+ew/+PqDmL4S8xXGQPi6Q8fxAogbwo1mPptDO2y8ChEjtc9LI5HOLu90EYz7A # s62SPDsh3gx8vOthrJVEk0LqCbw4N3s5dFdmHNrnjXCsKQFifgucQ+yZy8ipy34N # wWHSJ9nipBQLvkK23iCxkbl3cTyr44Rlweae/TZR4/FjFCEe3N555LQU0fruEqRo # AHW1RjYhGvOfL9knLWzIQqW2QjcCnKky3bJhwHh3crfWE69nvVJTkbSF6oUxWSG0 # RzSToK3nN5tmvUlyvbTBE9u0K9JkOcbtMQiAgj39nR9xpsaUZZa0zSWOmliYIuBC # kWuUY0/nAQk6gxHBKyu8q09ACBbzeCp+lVPOYXdxax8QMeURSa9fB1qY7JmI5QAZ # bg0ypD2pvbxhidU5TWpw7araAYyBOJrEYjnOkhXB4oa01ZWu2d0uNhGWo83h3Wjy # ahKrNDoVIQIdh8QkYy/ZqDwhCMoNM+pQcfUzsYxkqZC/JiiM/qxm87pTHQ/x2yQA # l0MLzljGv90/dklokrqeg4REwMqfwzc74PUbKdCk43saemmatslK3ktu3xAzUlQW # 2xmZQTnKwXDf+U3YnYryDddow2LsU7qlu8dlDGNd0WIrE5LRCCXzhv8la66O0jVE # qMOHpBPkwMlACBwiXuxV6ucelk4vy+XvabeQUsizm0m+PR7TwJY= # =9phd # -----END PGP SIGNATURE----- # gpg: Signature made Wed 10 Apr 2024 10:11:58 BST # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * tag 'hw-misc-20240410' of https://github.com/philmd/qemu: hw/audio/virtio-snd: Remove unused assignment hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum() hw/sd/sdhci: Do not update TRNMOD when Command Inhibit (DAT) is set hw/net/lan9118: Fix overflow in MIL TX FIFO hw/net/lan9118: Replace magic '2048' value by MIL_TXFIFO_SIZE definition backends/cryptodev: Do not abort for invalid session ID hw/misc/applesmc: Fix memory leak in reset() handler hw/misc/applesmc: Do not call DeviceReset from DeviceRealize hw/block/nand: Fix out-of-bound access in NAND block buffer hw/block/nand: Have blk_load() take unsigned offset and return boolean hw/block/nand: Factor nand_load_iolen() method out qemu-options: Fix CXL Fixed Memory Window interleave-granularity typo hw/virtio/virtio-crypto: Protect from DMA re-entrancy bugs hw/char/virtio-serial-bus: Protect from DMA re-entrancy bugs hw/display/virtio-gpu: Protect from DMA re-entrancy bugs hw/virtio: Introduce virtio_bh_new_guarded() helper Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-04-10hw/virtio: Introduce virtio_bh_new_guarded() helperPhilippe Mathieu-Daudé
Introduce virtio_bh_new_guarded(), similar to qemu_bh_new_guarded() but using the transport memory guard, instead of the device one (there can only be one virtio device per virtio bus). Inspired-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20240409105537.18308-2-philmd@linaro.org>
2024-04-09accel/tcg: Improve can_do_io managementRichard Henderson
We already attempted to set and clear can_do_io before the first and last insns, but only used the initial value of max_insns and the call to translator_io_start to find those insns. Now that we track insn_start in DisasContextBase, and now that we have emit_before_op, we can wait until we have finished translation to identify the true first and last insns and emit the sets of can_do_io at that time. This fixes the case of a translation block which crossed a page boundary, and for which the second page turned out to be mmio. In this case we truncate the block, and the previous logic for can_do_io could leave a block with a single insn with can_do_io set to false, which would fail an assertion in cpu_io_recompile. Reported-by: Jørgen Hansen <Jorgen.Hansen@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Jørgen Hansen <Jorgen.Hansen@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-04-09accel/tcg: Add insn_start to DisasContextBaseRichard Henderson
This is currently target-specific for many; begin making it target independent. Tested-by: Jørgen Hansen <Jorgen.Hansen@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-04-09tcg: Add TCGContext.emit_before_opRichard Henderson
Allow operations to be emitted via normal expanders into the middle of the opcode stream. Tested-by: Jørgen Hansen <Jorgen.Hansen@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-04-09virtio-snd: rewrite invalid tx/rx message handlingManos Pitsidianakis
The current handling of invalid virtqueue elements inside the TX/RX virt queue handlers is wrong. They are added in a per-stream invalid queue to be processed after the handler is done examining each message, but the invalid message might not be specifying any stream_id; which means it's invalid to add it to any stream->invalid queue since stream could be NULL at this point. This commit moves the invalid queue to the VirtIOSound struct which guarantees there will always be a valid temporary place to store them inside the tx/rx handlers. The queue will be emptied before the handler returns, so the queue must be empty at any other point of the device's lifetime. Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Message-Id: <virtio-snd-rewrite-invalid-tx-rx-message-handling-v1.manos.pitsidianakis@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-04-09Revert "hw/virtio: Add support for VDPA network simulation devices"Michael S. Tsirkin
This reverts commit cd341fd1ffded978b2aa0b5309b00be7c42e347c. The patch adds non-upstream code in include/standard-headers/linux/virtio_pci.h which would make maintainance harder. Revert for now. Suggested-by: Jason Wang <jasowang@redhat.com> Message-Id: <df6b6b465753e754a19459e8cd61416548f89a42.1712569644.git.mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-29Revert "tap: setting error appropriately when calling net_init_tap_one()"Akihiko Odaki
This reverts commit 46d4d36d0bf2b24b205f2f604f0905db80264eef. The reverted commit changed to emit warnings instead of errors when vhost is requested but vhost initialization fails if vhostforce option is not set. However, vhostforce is not meant to ignore vhost errors. It was once introduced as an option to commit 5430a28fe4 ("vhost: force vhost off for non-MSI guests") to force enabling vhost for non-MSI guests, which will have worse performance with vhost. The option was deprecated with commit 1e7398a140 ("vhost: enable vhost without without MSI-X") and changed to behave identical with the vhost option for compatibility. Worse, commit bf769f742c ("virtio: del net client if net_init_tap_one failed") changed to delete the client when vhost fails even when the failure only results in a warning. The leads to an assertion failure for the -netdev command line option. The reverted commit was intended to avoid that the vhost initialization failure won't result in a corrupted netdev. This problem should have been fixed by deleting netdev when the initialization fails instead of ignoring the failure with an arbitrary option. Fortunately, commit bf769f742c ("virtio: del net client if net_init_tap_one failed"), mentioned earlier, implements this behavior. Restore the correct semantics and fix the assertion failure for the -netdev command line option by reverting the problematic commit. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-03-26hw/clock: Let clock_set_mul_div() return a boolean valuePhilippe Mathieu-Daudé
Let clock_set_mul_div() return a boolean value whether the clock has been updated or not, similarly to clock_set(). Return early when clock_set_mul_div() is called with same mul/div values the clock has. Acked-by: Luc Michel <luc@lmichel.fr> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20240325152827.73817-2-philmd@linaro.org>
2024-03-25misc/pca955*: Move models under hw/gpioCédric Le Goater
The PCA9552 and PCA9554 devices are both I2C GPIO controllers and the PCA9552 also can drive LEDs. Do all the necessary adjustments to move the models under hw/gpio. Cc: Glenn Miles <milesg@linux.vnet.ibm.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20240325134833.1484265-1-clg@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-20ui/curses: Do not use console_select()Akihiko Odaki
ui/curses is the only user of console_select(). Move the implementation to ui/curses. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20240319-console-v2-4-3fd6feef321a@daynix.com>
2024-03-20ui/vnc: Do not use console_select()Akihiko Odaki
console_select() is shared by other displays and a console_select() call from one of them triggers console switching also in ui/curses, circumventing key state reinitialization that needs to be performed in preparation and resulting in stuck keys. Use its internal state to track the current active console to prevent such a surprise console switch. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20240319-console-v2-2-3fd6feef321a@daynix.com>
2024-03-19Merge tag 'pull-for-9.0-20240319' of https://github.com/legoater/qemu into ↵Peter Maydell
staging aspeed, pnv, vfio queue: * user device fixes for Aspeed and PowerNV machines * coverity fix for iommufd # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmX5mm0ACgkQUaNDx8/7 # 7KE/MQ/9GeX4yNBxY2iTATdmPXwjMw8AtKyfIQb605nIO0ch1Z98ywl5VMwCNohn # ppY9L5bFpEASgRlFVm73X4DGxKyRGpRPqylsvINh0hKciRpmRkELHY3llhnXsd7P # Q197pDtFr54FeX8j4+hSAu4paT97fPENlKn0J6lto2I1cXGcD1LYNDFhysoXdGme # brJgo7KjQJZPZ560ZewskL5FWf3G9EkRjpqd8y0G5OtNmAPgAaahOMHhDCXan182 # J89I9CHI5xN45MRfAs8JamSaj/GyNsr4h04WhPa0+VZQ5vsaeW2Ekt4ypj+oAV+p # wykhYzQk4ALZcmmph2flSAtLa7uheI+imyqubMthQCDj3G8onSQBMd5/4WRK6O49 # 0oE1DpPDEfhlJEQYxaYhOeqeA9iaP+w6V+yE+L5oGlMO66cR7GZsPu0x7kXailbH # IoHw9mO+vMkpuyeP7M3hA8WRFCdFpf1Nn1Ao5Jz3KoiTyJWlIvX5VSaj12sjddQ2 # fU9SKu2Q5QqS5uQGakkY64EyUy7RkGIX6zY2NIscVe2lfAfKf3mZwu7OIuLjEy5O # lRn35vWV8fOdRooKoDPTNcdBCaNPi+RApin8chOv5P+F+ie7+Twf9sb1AgH/pIcv # HptvTXbvSFNbbdb+OE8a5qsqTvnrN8d31IXzrWRYsJB07x2IyoA= # =zR3v # -----END PGP SIGNATURE----- # gpg: Signature made Tue 19 Mar 2024 14:00:13 GMT # gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1 # gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1 * tag 'pull-for-9.0-20240319' of https://github.com/legoater/qemu: aspeed/smc: Only wire flash devices at reset ppc/pnv: I2C controller is not user creatable vfio/iommufd: Fix memory leak Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-19aspeed/smc: Only wire flash devices at resetCédric Le Goater
The Aspeed machines have many Static Memory Controllers (SMC), up to 8, which can only drive flash memory devices. Commit 27a2c66c92ec ("aspeed/smc: Wire CS lines at reset") tried to ease the definitions of these devices by allowing flash devices from the command line to be attached to a SSI bus. For that, the wiring of the CS lines of the Aspeed SMC controller was moved at reset. Two assumptions are made though, first that the device has a SSI_GPIO_CS GPIO line, which is not always the case, and second that it is a flash device. Correct this problem by ensuring that the devices attached to the bus are of the correct flash type. This fixes a QEMU abort when devices without a CS line, such as the max111x, are passed on the command line. While at it, export TYPE_M25P80 used in the Xilinx Versal Virtual machine. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2228 Fixes: 27a2c66c92ec ("aspeed/smc: Wire CS lines at reset") Reported-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> [ clg: minor fixes in the commit log ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-03-19Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Peter Maydell
into staging virtio,pc,pci: bugfixes Some minor fixes plus a big patchset from Igor fixing a regression with windows. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmX4NzsPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpkp0H/1foAaDYrApMiIkji4aI94bq/fwTnu5CshNP # +YEzwJCS4qbl67/Ix2Z+xVz7twjQbgGdLd6hb9ZypAQfclUk5tDoKyCmqHtQMakX # T080FayOvWmUEostAw7MXvuz0HpJlgnJaJBn29l1hHjA/XXahKqcc705cup+W8hv # F7xb6AoFcbdETMzNaoqekNaHiiYyQPITY9p/UYPLzj2zyLsspR9kBebIeA1yhtXw # Tmc3+FMquoM2fMNxpwfhCBswg662MlOXhLN3dmyLqeJRl09x1GvaeJIGMY2MbefM # RMMv0/jqwAyii5HXew2rPIbLdULGq+hSjZo2NOlx3EOjTCaOkXc= # =XGMp # -----END PGP SIGNATURE----- # gpg: Signature made Mon 18 Mar 2024 12:44:43 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (24 commits) smbios: add extra comments to smbios_get_table_legacy() tests: acpi: update expected SSDT.dimmpxm blob pc/q35: set SMBIOS entry point type to 'auto' by default tests: acpi/smbios: whitelist expected blobs smbios: error out when building type 4 table is not possible smbios: in case of entry point is 'auto' try to build v2 tables 1st smbios: extend smbios-entry-point-type with 'auto' value smbios: clear smbios_type4_count before building tables smbios: get rid of global smbios_ep_type smbios: handle errors consistently smbios: build legacy mode code only for 'pc' machine smbios: rename/expose structures/bitmaps used by both legacy and modern code smbios: add smbios_add_usr_blob_size() helper smbios: don't check type4 structures in legacy mode smbios: avoid mangling user provided tables smbios: get rid of smbios_legacy global smbios: get rid of smbios_smp_sockets global smbios: cleanup smbios_get_tables() from legacy handling tests: smbios: add test for legacy mode CLI options tests: smbios: add test for -smbios type=11 option ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-18Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into stagingPeter Maydell
Block layer patches - mirror: Fix deadlock - nbd/server: Fix race in draining the export - qemu-img snapshot: Fix formatting with large values - Fix blockdev-snapshot-sync error reporting for no medium - iotests fixes # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmX4OG8RHGt3b2xmQHJl # ZGhhdC5jb20ACgkQfwmycsiPL9YdiQ//faXfGmbK6rBW4AkpwfrRM8SDHvm6hz7L # 043ujAi3ziSXXoiec2/RK5wZ27nMJkfIrRHXpH41hgQvC6/3a4eIW6KSTaFV1PdG # JtHCeopmVmgu7TZQ+kt/J6eLUTTLovoO94HgEfmxpr4CGZfx9RJftf2kCKILcYkh # 9r04zSZLByVd4FJ5ZrqsFulWif5mXoGKdT/YisY3tKiCwFRWQDOoTymvJA012VtO # MVmID593zwem3O3qtlGiGlK9qodBR4yof66xa/0gaYP98BZgv+LWnwLKha+OzSpX # bQlxT26LY4JnSQkTdjF0QYnQiH4Q1kveUcNRZrGpA4iZxVDq1aks5DisThDwqoGG # rhaPOWyJwJsonM1Enzim5Jd60JqvGdpTLjSA5oSyTjw62lAulnYihInERYSAFyyz # UhQaO7qSog1//RpPEXEsiVkJBq8BE9l5I+L7+l5SCBhNr/UwZAOer/4m4X6d0SKN # GEPRx0kH1voikzx7gIQs+Oldqvb0sg+zAvOynBxzpd+Ac6s8bFtWe+eSyWYL/ZGr # Jg9+PL1xir/Uh7KmOnzt/iVBAmfSRpAo1O72xQXvHFYYtIP7hTkPO/vzqF206WMc # WQFHHjfp5gVcMZ5AYg6txw+Bbtzu8g0AfB054lgnhihuShpf0E923TTDQFdV755s # NUlrzuGu2fs= # =+JIK # -----END PGP SIGNATURE----- # gpg: Signature made Mon 18 Mar 2024 12:49:51 GMT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * tag 'for-upstream' of https://repo.or.cz/qemu/kevin: iotests: adapt to output change for recently introduced 'detached header' field tests/qemu-iotests: Restrict tests using "--blockdev file" to the file protocol tests/qemu-iotests: Fix some tests that use --image-opts for other protocols tests/qemu-iotests: Restrict tests that use --image-opts to the 'file' protocol tests/qemu-iotests: Restrict test 156 to the 'file' protocol tests/qemu-iotests: Restrict test 134 and 158 to the 'file' protocol tests/qemu-iotests: Restrict test 130 to the 'file' protocol tests/qemu-iotests: Restrict test 114 to the 'file' protocol tests/qemu-iotests: Restrict test 066 to the 'file' protocol tests/qemu-iotests: Fix test 033 for running with non-file protocols qemu-img: Fix Column Width and Improve Formatting in snapshot list blockdev: Fix blockdev-snapshot-sync error reporting for no medium iotests: Add test for reset/AioContext switches with NBD exports nbd/server: Fix race in draining the export mirror: Don't call job_pause_point() under graph lock Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-18Merge tag 'migration-20240317-pull-request' of ↵Peter Maydell
https://gitlab.com/peterx/qemu into staging Migration pull for 9.0-rc0 - Nicholas/Phil's fix on migration corruption / inconsistent for tcg - Cedric's fix on block migration over n_sectors==0 - Steve's CPR reboot documentation page - Fabiano's misc fixes on mapped-ram (IOC leak, dup() errors, fd checks, fd use race, etc.) # -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZfdZEhIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wa+1AEA0+f7nCssvsILvCY9KifYO+OUJsLodUuQ # JW0JBz+1iPMA+wSiyIVl2Xg78Q97nJxv71UJf+1cDJENA5EMmXMnxmYK # =SLnA # -----END PGP SIGNATURE----- # gpg: Signature made Sun 17 Mar 2024 20:56:50 GMT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [marginal] # gpg: aka "Peter Xu <peterx@redhat.com>" [marginal] # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706 * tag 'migration-20240317-pull-request' of https://gitlab.com/peterx/qemu: migration/multifd: Duplicate the fd for the outgoing_args migration/multifd: Ensure we're not given a socket for file migration migration: Fix iocs leaks during file and fd migration migration: cpr-reboot documentation migration: Skip only empty block devices physmem: Fix migration dirty bitmap coherency with TCG memory access physmem: Factor cpu_physical_memory_dirty_bits_cleared() out physmem: Expose tlb_reset_dirty_range_all() migration: Fix error handling after dup in file migration io: Introduce qio_channel_file_new_dupfd Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-18smbios: get rid of global smbios_ep_typeIgor Mammedov
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240314152302.2324164-14-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-18smbios: handle errors consistentlyIgor Mammedov
Current code uses mix of error_report()+exit(1) and error_setg() to handle errors. Use newer error_setg() everywhere, beside consistency it will allow to detect error condition without killing QEMU and attempt switch-over to SMBIOS3.x tables/entrypoint in follow up patch. while at it, clear smbios_tables pointer after freeing. that will avoid double free if smbios_get_tables() is called multiple times. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Message-Id: <20240314152302.2324164-13-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-18smbios: build legacy mode code only for 'pc' machineIgor Mammedov
basically moving code around without functional change. And exposing some symbols so that they could be shared between smbbios.c and new smbios_legacy.c plus some meson magic to build smbios_legacy.c only for 'pc' machine and otherwise replace it with stub if not selected. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Message-Id: <20240314152302.2324164-12-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-18smbios: rename/expose structures/bitmaps used by both legacy and modern codeIgor Mammedov
As a preparation to move legacy handling into a separate file, add prefix 'smbios_' to type0/type1/have_binfile_bitmap/have_fields_bitmap and expose them in smbios.h so that they can be reused in legacy and modern code. Doing it as a separate patch to avoid rename cluttering follow-up patch which will move legacy code into a separate file. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Message-Id: <20240314152302.2324164-11-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-18smbios: don't check type4 structures in legacy modeIgor Mammedov
legacy mode doesn't support structures of type 2 and more, and CLI has a check for '-smbios type' option, however it's still possible to sneak in type4 as a blob with '-smbios file' option. However doing the later makes SMBIOS tables broken since SeaBIOS doesn't expect that. Rather than trying to add support for type4 to legacy code (both QEMU and SeaBIOS), simplify smbios_get_table_legacy() by dropping not relevant check in legacy code and error out on type4 blob. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240314152302.2324164-9-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-18smbios: get rid of smbios_legacy globalIgor Mammedov
clean up smbios_set_defaults() which is reused by legacy and non legacy machines from being aware of 'legacy' notion and need to turn it off. And push legacy handling up to PC machine code where it's relevant. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Acked-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240314152302.2324164-7-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-18smbios: get rid of smbios_smp_sockets globalIgor Mammedov
it makes smbios_validate_table() independent from smbios_smp_sockets global, which in turn lets smbios_get_tables() avoid using not related legacy code. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240314152302.2324164-6-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-18mirror: Don't call job_pause_point() under graph lockKevin Wolf
Calling job_pause_point() while holding the graph reader lock potentially results in a deadlock: bdrv_graph_wrlock() first drains everything, including the mirror job, which pauses it. The job is only unpaused at the end of the drain section, which is when the graph writer lock has been successfully taken. However, if the job happens to be paused at a pause point where it still holds the reader lock, the writer lock can't be taken as long as the job is still paused. Mark job_pause_point() as GRAPH_UNLOCKED and fix mirror accordingly. Cc: qemu-stable@nongnu.org Buglink: https://issues.redhat.com/browse/RHEL-28125 Fixes: 004915a96a7a ("block: Protect bs->backing with graph_lock") Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20240313153000.33121-1-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-03-13Merge tag 'pull-maintainer-final-130324-1' of ↵Peter Maydell
https://gitlab.com/stsquad/qemu into staging final updates for 9.0 (testing, gdbstub): - fix the over rebuilding of test VMs - support Xfer:siginfo:read in gdbstub - fix double close() in gdbstub # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmXxkb0ACgkQ+9DbCVqe # KkSw9wf+K+3kJYaZ2unEFku3Y6f4Z9XkrZCsFQFVNIJQgpYVc6peQyLUB1pZwzZc # yoQhmTIgej16iRZc7gEcJhFl2zlX2vulE/m+wiaR0Chv3E2r510AGn4aWl+GLB9+ # /WduHaz1NobPW4JWaarxespa84Re8QZQgqkHX4nwYd++FW63E4uxydL4F1nmSNca # eTA6RwS48h4wqPzHBX72hYTRUnYrDUSSGCGUDzK3NHumuPi+AQ77GLRMO0MTYFfy # hWriapogCmghY+Xtn++eUIwDyh1CCnUT6Ntf5Qj06bZ+f6eaTwINM8QWhj9mxYX+ # 5/F5Q4JJDqRPYw/hF4wYXRsiZxTYFw== # =BOWW # -----END PGP SIGNATURE----- # gpg: Signature made Wed 13 Mar 2024 11:45:01 GMT # gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44 # gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [full] # Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44 * tag 'pull-maintainer-final-130324-1' of https://gitlab.com/stsquad/qemu: gdbstub: Fix double close() of the follow-fork-mode socket tests/tcg: Add multiarch test for Xfer:siginfo:read stub gdbstub: Add Xfer:siginfo:read stub gdbstub: Save target's siginfo linux-user: Move tswap_siginfo out of target code gdbstub: Rename back gdb_handlesig tests/vm: ensure we build everything by default Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-13Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Peter Maydell
into staging virtio,pc,pci: features, cleanups, fixes more memslots support in libvhost-user support PCIe Gen5/Gen6 link speeds in pcie more traces in vdpa network simulation devices support in vdpa SMBIOS type 9 descriptor implementation Bump max_cpus to 4096 vcpus in q35 aw-bits and granule options in VIRTIO-IOMMU Support report NUMA nodes for device memory using GI in acpi Beginning of shutdown event support in pvpanic fixes, cleanups all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmXw0TMPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRp8x4H+gLMoGwaGAX7gDGPgn2Ix4j/3kO77ZJ9X9k/ # 1KqZu/9eMS1j2Ei+vZqf05w7qRjxxhwDq3ilEXF/+UFqgAehLqpRRB8j5inqvzYt # +jv0DbL11PBp/oFjWcytm5CbiVsvq8KlqCF29VNzc162XdtcduUOWagL96y8lJfZ # uPrOoyeR7SMH9lp3LLLHWgu+9W4nOS03RroZ6Umj40y5B7yR0Rrppz8lMw5AoQtr # 0gMRnFhYXeiW6CXdz+Tzcr7XfvkkYDi/j7ibiNSURLBfOpZa6Y8+kJGKxz5H1K1G # 6ZY4PBcOpQzl+NMrktPHogczgJgOK10t+1i/R3bGZYw2Qn/93Eg= # =C0UU # -----END PGP SIGNATURE----- # gpg: Signature made Tue 12 Mar 2024 22:03:31 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (68 commits) docs/specs/pvpanic: document shutdown event hw/cxl: Fix missing reserved data in CXL Device DVSEC hmat acpi: Fix out of bounds access due to missing use of indirection hmat acpi: Do not add Memory Proximity Domain Attributes Structure targetting non existent memory. qemu-options.hx: Document the virtio-iommu-pci aw-bits option hw/arm/virt: Set virtio-iommu aw-bits default value to 48 hw/i386/q35: Set virtio-iommu aw-bits default value to 39 virtio-iommu: Add an option to define the input range width virtio-iommu: Trace domain range limits as unsigned int qemu-options.hx: Document the virtio-iommu-pci granule option virtio-iommu: Change the default granule to the host page size virtio-iommu: Add a granule property hw/i386/acpi-build: Add support for SRAT Generic Initiator structures hw/acpi: Implement the SRAT GI affinity structure qom: new object to associate device to NUMA node hw/i386/pc: Inline pc_cmos_init() into pc_cmos_init_late() and remove it hw/i386/pc: Set "normal" boot device order in pc_basic_device_init() hw/i386/pc: Avoid one use of the current_machine global hw/i386/pc: Remove "rtc_state" link again Revert "hw/i386/pc: Confine system flash handling to pc_sysfw" ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # hw/core/machine.c
2024-03-13Merge tag 'pull-ppc-for-9.0-2-20240313' of https://gitlab.com/npiggin/qemu ↵Peter Maydell
into staging * PAPR nested hypervisor host implementation for spapr TCG * excp_helper.c code cleanups and improvements * Move more ops to decodetree * Deprecate pseries-2.12 machines and P9 and P10 DD1.0 CPUs * Document running Linux on AmigaNG * Update dt feature advertising POWER CPUs. * Add P10 PMU SPRs * Improve pnv topology calculation for SMT8 CPUs. * Various bug fixes. # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCgAdFiEETkN92lZhb0MpsKeVZ7MCdqhiHK4FAmXwiT8ACgkQZ7MCdqhi # HK7C/w//XxEO2bQTFPLFDTrP/voq7pcX8XeQNVyXCkXYjvsbu05oQow50k+Y5UAE # US4MFjt8jFz0vuIKuKyoA3kG41zDSOzoX4TQXMM+tyTWbuFF3KAyfizb1xE6SYAN # xJEGvmiXv/EgoSBD7BTKQp1tMPdIGZLwSdYiA0lmOo7YaMCgYAXaujW5hnNjQecT # 873sN+10pHtQY++mINtD9Nfb6AcDGMWw0b+bykqIXhNRkI8IGOS4WF4vAuMBrwfe # UM00wDnNRb86Dk14bv2XVNDr6/i0VRtUMwM4yiptrQ1TQx18LZaPSQFYjQfPaan7 # LwN4QkMFnBX54yJ7Npvjvu8BCBF47kwOVu4CIAFJ4sIm0WfTmozDpPttwcZ5w7Ve # iXDOB9ECAB4pQ2rCgbSNG8MYUZgoHHOuThqolOP0Vh9NHRRJxpdw6CyAbmCGftc0 # lvRDPFiKp8xmCNJ/j3XzoUdHoG7NMwpUmHv9ruGU18SdQ8hyJN9AcQGWYrB4v0RV # /hs2RAbwntG7ahkcwd8uy5aFw88Wph/uGXPXc49EWj7i49vHeIV2y5+gtthMywje # qqjFXkistXuF+JHVnyoYmqqCyXaHX5CEwtawMv4EQeaJs76bLhMeMTKKl9rRp8qB # DtbIZphO8iMsocrBnje48sA5HR0PM+H4HTjw10i8R0fLlWitaIY= # =XnY5 # -----END PGP SIGNATURE----- # gpg: Signature made Tue 12 Mar 2024 16:56:31 GMT # gpg: using RSA key 4E437DDA56616F4329B0A79567B30276A8621CAE # gpg: Good signature from "Nicholas Piggin <npiggin@gmail.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 4E43 7DDA 5661 6F43 29B0 A795 67B3 0276 A862 1CAE * tag 'pull-ppc-for-9.0-2-20240313' of https://gitlab.com/npiggin/qemu: (38 commits) spapr: nested: Introduce cap-nested-papr for Nested PAPR API spapr: nested: Introduce H_GUEST_RUN_VCPU hcall. spapr: nested: Use correct source for parttbl info for nested PAPR API. spapr: nested: Introduce H_GUEST_[GET|SET]_STATE hcalls. spapr: nested: Initialize the GSB elements lookup table. spapr: nested: Extend nested_ppc_state for nested PAPR API spapr: nested: Introduce H_GUEST_CREATE_VCPU hcall. spapr: nested: Introduce H_GUEST_[CREATE|DELETE] hcalls. spapr: nested: Introduce H_GUEST_[GET|SET]_CAPABILITIES hcalls. spapr: nested: Document Nested PAPR API spapr: nested: keep nested-hv related code restricted to its API. spapr: nested: Introduce SpaprMachineStateNested to store related info. spapr: nested: move nested part of spapr_get_pate into spapr_nested.c spapr: nested: register nested-hv api hcalls only for cap-nested-hv target/ppc: Remove interrupt handler wrapper functions target/ppc: Clean up ifdefs in excp_helper.c, part 3 target/ppc: Clean up ifdefs in excp_helper.c, part 2 target/ppc: Clean up ifdefs in excp_helper.c, part 1 target/ppc: Add gen_exception_err_nip() function target/ppc: Readability improvements in exception handlers ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-13gdbstub: Save target's siginfoGustavo Romero
Save target's siginfo into gdbserver_state so it can be used later, for example, in any stub that requires the target's si_signo and si_code. This change affects only linux-user mode. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> Suggested-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240309030901.1726211-4-gustavo.romero@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-03-13gdbstub: Rename back gdb_handlesigGustavo Romero
Rename gdb_handlesig_reason back to gdb_handlesig. There is no need to add a wrapper for gdb_handlesig and rename it when a new parameter is added. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240309030901.1726211-2-gustavo.romero@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2024-03-12hw/cxl: Fix missing reserved data in CXL Device DVSECJonathan Cameron
The r3.1 specification introduced a new 2 byte field, but to maintain DWORD alignment, a additional 2 reserved bytes were added. Forgot those in updating the structure definition but did include them in the size define leading to a buffer overrun. Also use the define so that we don't duplicate the value. Fixes: Coverity ID 1534095 buffer overrun Fixes: 8700ee15de ("hw/cxl: Standardize all references on CXL r3.1 and minor updates") Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240308143831.6256-1-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12virtio-iommu: Add an option to define the input range widthEric Auger
aw-bits is a new option that allows to set the bit width of the input address range. This value will be used as a default for the device config input_range.end. By default it is set to 64 bits which is the current value. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20240307134445.92296-7-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12virtio-iommu: Add a granule propertyEric Auger
This allows to choose which granule will be used by default by the virtio-iommu. Current page size mask default is qemu_target_page_mask so this translates into a 4k granule on ARM and x86_64 where virtio-iommu is supported. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20240307134445.92296-3-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/acpi: Implement the SRAT GI affinity structureAnkit Agrawal
ACPI spec provides a scheme to associate "Generic Initiators" [1] (e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with integrated compute or DMA engines GPUs) with Proximity Domains. This is achieved using Generic Initiator Affinity Structure in SRAT. During bootup, Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA node for each unique PXM ID encountered. Qemu currently do not implement these structures while building SRAT. Add GI structures while building VM ACPI SRAT. The association between device and node are stored using acpi-generic-initiator object. Lookup presence of all such objects and use them to build these structures. The structure needs a PCI device handle [2] that consists of the device BDF. The vfio-pci device corresponding to the acpi-generic-initiator object is located to determine the BDF. [1] ACPI Spec 6.3, Section 5.2.16.6 [2] ACPI Spec 6.3, Table 5.80 Cc: Jonathan Cameron <qemu-devel@nongnu.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cedric Le Goater <clg@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Message-Id: <20240308145525.10886-3-ankita@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12qom: new object to associate device to NUMA nodeAnkit Agrawal
NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows partitioning of the GPU device resources (including device memory) into several (upto 8) isolated instances. Each of the partitioned memory needs a dedicated NUMA node to operate. The partitions are not fixed and they can be created/deleted at runtime. Unfortunately Linux OS does not provide a means to dynamically create/destroy NUMA nodes and such feature implementation is not expected to be trivial. The nodes that OS discovers at the boot time while parsing SRAT remains fixed. So we utilize the Generic Initiator (GI) Affinity structures that allows association between nodes and devices. Multiple GI structures per BDF is possible, allowing creation of multiple nodes by exposing unique PXM in each of these structures. Implement the mechanism to build the GI affinity structures as Qemu currently does not. Introduce a new acpi-generic-initiator object to allow host admin link a device with an associated NUMA node. Qemu maintains this association and use this object to build the requisite GI Affinity Structure. When multiple NUMA nodes are associated with a device, it is required to create those many number of acpi-generic-initiator objects, each representing a unique device:node association. Following is one of a decoded GI affinity structure in VM ACPI SRAT. [0C8h 0200 1] Subtable Type : 05 [Generic Initiator Affinity] [0C9h 0201 1] Length : 20 [0CAh 0202 1] Reserved1 : 00 [0CBh 0203 1] Device Handle Type : 01 [0CCh 0204 4] Proximity Domain : 00000007 [0D0h 0208 16] Device Handle : 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 [0E0h 0224 4] Flags (decoded below) : 00000001 Enabled : 1 [0E4h 0228 4] Reserved2 : 00000000 [0E8h 0232 1] Subtable Type : 05 [Generic Initiator Affinity] [0E9h 0233 1] Length : 20 An admin can provide a range of acpi-generic-initiator objects, each associating a device (by providing the id through pci-dev argument) to the desired NUMA node (using the node argument). Currently, only PCI device is supported. For the grace hopper system, create a range of 8 nodes and associate that with the device using the acpi-generic-initiator object. While a configuration of less than 8 nodes per device is allowed, such configuration will prevent utilization of the feature to the fullest. The following sample creates 8 nodes per PCI device for a VM with 2 PCI devices and link them to the respecitve PCI device using acpi-generic-initiator objects: -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \ -numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \ -numa node,nodeid=8 -numa node,nodeid=9 \ -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \ -object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \ -object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \ -object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \ -object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \ -object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \ -object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \ -object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \ -object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \ -numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \ -numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \ -numa node,nodeid=16 -numa node,nodeid=17 \ -device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \ -object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \ -object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \ -object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \ -object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \ -object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \ -object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \ -object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \ -object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \ Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1] Cc: Jonathan Cameron <qemu-devel@nongnu.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Message-Id: <20240308145525.10886-2-ankita@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/i386/pc: Inline pc_cmos_init() into pc_cmos_init_late() and remove itBernhard Beschow
Now that pc_cmos_init() doesn't populate the X86MachineState::rtc attribute any longer, its duties can be merged into pc_cmos_init_late() which is called within machine_done notifier. This frees pc_piix and pc_q35 from explicit CMOS initialization. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240303185332.1408-5-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12Revert "hw/i386/pc: Confine system flash handling to pc_sysfw"Bernhard Beschow
Specifying the property `-M pflash0` results in a regression: qemu-system-x86_64: Property 'pc-q35-9.0-machine.pflash0' not found Revert the change for now until a solution is found. This reverts commit 6f6ad2b24582593d8feb00434ce2396840666227. Reported-by: Volker Rümelin <vr_qemu@t-online.de> Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240226215909.30884-3-shentey@gmail.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12pcie_sriov: Reset SR-IOV extended capabilityAkihiko Odaki
pcie_sriov_pf_disable_vfs() is called when resetting the PF, but it only disables VFs and does not reset SR-IOV extended capability, leaking the state and making the VF Enable register inconsistent with the actual state. Replace pcie_sriov_pf_disable_vfs() with pcie_sriov_pf_reset(), which does not only disable VFs but also resets the capability. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240228-reuse-v8-3-282660281e60@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
2024-03-12Implement SMBIOS type 9 v2.6Felix Wu
Signed-off-by: Felix Wu <flwu@google.com> Signed-off-by: Nabih Estefan <nabihestefan@google.com> Message-Id: <20240221170027.1027325-3-nabihestefan@google.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12Implement base of SMBIOS type 9 descriptor.Felix Wu
Version 2.1+. Signed-off-by: Felix Wu <flwu@google.com> Signed-off-by: Nabih Estefan <nabihestefan@google.com> Message-Id: <20240221170027.1027325-2-nabihestefan@google.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/virtio: Add support for VDPA network simulation devicesHao Chen
This patch adds support for VDPA network simulation devices. The device is developed based on virtio-net and tap backend, and supports hardware live migration function. For more details, please refer to "docs/system/devices/vdpa-net.rst" Signed-off-by: Hao Chen <chenh@yusur.tech> Message-Id: <20240221073802.2888022-1-chenh@yusur.tech> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/audio/virtio-sound: return correct command response sizeVolker Rümelin
The payload size returned by command VIRTIO_SND_R_PCM_INFO is wrong. The code in process_cmd() assumes that all commands return only a virtio_snd_hdr payload, but some commands like VIRTIO_SND_R_PCM_INFO may return an additional payload. Add a zero initialized payload_size variable to struct virtio_snd_ctrl_command to allow for additional payloads. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Volker Rümelin <vr_qemu@t-online.de> Message-Id: <20240218083351.8524-1-vr_qemu@t-online.de> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/pci-bridge/pxb-cxl: Drop RAS capability from host bridge.Jonathan Cameron
This CXL component isn't allowed to have a RAS capability. Whilst this should be harmless as software is not expected to look here, good to clean it up. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240215155206.2736-1-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>