slackcoder/qemu - QEMU is a generic and open source machine & userspace emulator and virtualizer

Age	Commit message (Collapse)	Author
2021-03-23	memory: Add offset_in_region to flatview_cb arguments	Peter Maydell
	The function flatview_for_each_range() calls a callback for each range in a FlatView. Currently the callback gets the start and length of the range and the MemoryRegion involved, but not the offset within the MemoryRegion. Add this to the callback's arguments; we're going to want it for a new use in the next commit. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20210318174823.18066-4-peter.maydell@linaro.org
2021-03-23	memory: Document flatview_for_each_range()	Peter Maydell
	Add a documentation comment describing flatview_for_each_range(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20210318174823.18066-3-peter.maydell@linaro.org
2021-03-23	memory: Make flatview_cb return bool, not int	Peter Maydell
	The return value of the flatview_cb callback passed to the flatview_for_each_range() function is zero if the iteration through the ranges should continue, or non-zero to break out of it. Use a bool for this rather than int. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20210318174823.18066-2-peter.maydell@linaro.org
2021-03-23	hw/arm/virt: Disable pl011 clock migration if needed	Gavin Shan
	A clock is added by commit aac63e0e6ea3 ("hw/char/pl011: add a clock input") since v5.2.0 which corresponds to virt-5.2 machine type. It causes backwards migration failure from upstream to downstream (v5.1.0) when the machine type is specified with virt-5.1. This fixes the issue by following instructions from section "Connecting subsections to properties" in docs/devel/migration.rst. With this applied, the PL011 clock is migrated based on the machine type. virt-5.2 or newer: migration virt-5.1 or older: non-migration Cc: qemu-stable@nongnu.org # v5.2.0+ Fixes: aac63e0e6ea3 ("hw/char/pl011: add a clock input") Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-id: 20210318023801.18287-1-gshan@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-23	edid: prefer standard timings	Gerd Hoffmann
	Windows guests using the "Basic Display Adapter" don't parse the "Established timings III" block. They also don't parse any edid extension. So prefer the "Standard Timings" block to store the display resolutions in edid_fill_modes(). Also reorder the mode list, so more exotic resolutions (specifically the ones which are not supported by vgabios) are moved down and the remaining ones have a better chance to get one of the eight slots in the "Standard Timings" block. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Message-Id: <20210316143812.2363588-6-kraxel@redhat.com>
2021-03-23	xen-block: Fix removal of backend instance via xenstore	Anthony PERARD
	Whenever a Xen block device is detach via xenstore, the image associated with it remained open by the backend QEMU and an error is logged: qemu-system-i386: failed to destroy drive: Node xvdz-qcow2 is in use This happened since object_unparent() doesn't immediately frees the object and thus keep a reference to the node we are trying to free. The reference is hold by the "drive" property and the call xen_block_drive_destroy() fails. In order to fix that, we call drain_call_rcu() to run the callback setup by bus_remove_child() via object_unparent(). Fixes: 2d24a6466154 ("device-core: use RCU for list of children of a bus") Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Message-Id: <20210308143232.83388-1-anthony.perard@citrix.com>
2021-03-23	Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging	Peter Maydell
	pc,virtio,pci: fixes, features Fixes all over the place. ACPI index support. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Mon 22 Mar 2021 22:58:45 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: acpi: Move setters/getters of oem fields to X86MachineState acpi: Set proper maximum size for "etc/acpi/rsdp" blob acpi: Move maximum size logic into acpi_add_rom_blob() microvm: Don't open-code "etc/table-loader" acpi: Set proper maximum size for "etc/table-loader" blob tests: acpi: update expected blobs pci: acpi: add _DSM method to PCI devices acpi: add aml_to_decimalstring() and aml_call6() helpers pci: acpi: ensure that acpi-index is unique pci: introduce acpi-index property for PCI device tests: acpi: temporary whitelist DSDT changes virtio-pmem: fix virtio_pmem_resp assign problem vhost-user: Monitor slave channel in vhost_user_read() vhost-user: Introduce nested event loop in vhost_user_read() vhost-user: Convert slave channel to QIOChannelSocket vhost-user: Factor out duplicated slave_fd teardown code vhost-user: Fix double-close on slave_read() error path vhost-user: Drop misleading EAGAIN checks in slave_read() virtio: Fix virtio_mmio_read()/virtio_mmio_write() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-23	include/ui/console.h: Delete is_surface_bgr()	Peter Maydell
	The function is_surface_bgr() is no longer used anywhere, so we can delete it. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20210314163927.1184-1-peter.maydell@linaro.org> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2021-03-23	qmp: add new qmp display-reload	Zihao Chang
	This patch provides a new qmp to reload display configuration without restart VM, but only reloading the vnc tls certificates is implemented. Example: {"execute": "display-reload", "arguments":{"type": "vnc", "tls-certs": true}} Signed-off-by: Zihao Chang <changzihao1@huawei.com> Message-Id: <20210316075845.1476-4-changzihao1@huawei.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2021-03-23	vnc: support reload x509 certificates for vnc	Zihao Chang
	This patch add vnc_display_reload_certs() to support update x509 certificates. Signed-off-by: Zihao Chang <changzihao1@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210316075845.1476-3-changzihao1@huawei.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2021-03-23	crypto: add reload for QCryptoTLSCredsClass	Zihao Chang
	This patch adds reload interface for QCryptoTLSCredsClass and implements the interface for QCryptoTLSCredsX509. Signed-off-by: Zihao Chang <changzihao1@huawei.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210316075845.1476-2-changzihao1@huawei.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2021-03-22	target/riscv: Prevent lost illegal instruction exceptions	Georg Kotheimer
	When decode_insn16() fails, we fall back to decode_RV32_64C() for further compressed instruction decoding. However, prior to this change, we did not raise an illegal instruction exception, if decode_RV32_64C() fails to decode the instruction. This means that we skipped illegal compressed instructions instead of raising an illegal instruction exception. Instead of patching decode_RV32_64C(), we can just remove it, as it is dead code since f330433b363 anyway. Signed-off-by: Georg Kotheimer <georg.kotheimer@kernkonzept.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210322121609.3097928-1-georg.kotheimer@kernkonzept.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	docs/system: riscv: Add documentation for 'microchip-icicle-kit' machine	Bin Meng
	This adds the documentation to describe what is supported for the 'microchip-icicle-kit' machine, and how to boot the machine in QEMU. Signed-off-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210322075248.136255-2-bmeng.cn@gmail.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	hw/riscv: microchip_pfsoc: Map EMMC/SD mux register	Bin Meng
	Since HSS commit c20a89f8dcac, the Icicle Kit reference design has been updated to use a register mapped at 0x4f000000 instead of a GPIO to control whether eMMC or SD card is to be used. With this support the same HSS image can be used for both eMMC and SD card boot flow, while previously two different board configurations were used. This is undocumented but one can take a look at the HSS code HSS_MMCInit() in services/mmc/mmc_api.c. With this commit, HSS image built from 2020.12 release boots again. Signed-off-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210322075248.136255-1-bmeng.cn@gmail.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	hw/block: m25p80: Support fast read for SST flashes	Bin Meng
	Per SST25VF016B datasheet [1], SST flash requires a dummy byte after the address bytes. Note only SPI mode is supported by SST flashes. [1] http://ww1.microchip.com/downloads/en/devicedoc/s71271_04.pdf Signed-off-by: Bin Meng <bin.meng@windriver.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210306060152.7250-1-bmeng.cn@gmail.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	target/riscv: Add proper two-stage lookup exception detection	Georg Kotheimer
	The current two-stage lookup detection in riscv_cpu_do_interrupt falls short of its purpose, as all it checks is whether two-stage address translation either via the hypervisor-load store instructions or the MPRV feature would be allowed. What we really need instead is whether two-stage address translation was active when the exception was raised. However, in riscv_cpu_do_interrupt we do not have the information to reliably detect this. Therefore, when we raise a memory fault exception we have to record whether two-stage address translation is active. Signed-off-by: Georg Kotheimer <georg.kotheimer@kernkonzept.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210319141459.1196741-1-georg.kotheimer@kernkonzept.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	target/riscv: Fix read and write accesses to vsip and vsie	Georg Kotheimer
	The previous implementation was broken in many ways: - Used mideleg instead of hideleg to mask accesses - Used MIP_VSSIP instead of VS_MODE_INTERRUPTS to mask writes to vsie - Did not shift between S bits and VS bits (VSEIP <-> SEIP, ...) Signed-off-by: Georg Kotheimer <georg.kotheimer@kernkonzept.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210311094738.1376795-1-georg.kotheimer@kernkonzept.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	hw/riscv: allow ramfb on virt	Asherah Connor
	Allow ramfb on virt. This lets `-device ramfb' work. Signed-off-by: Asherah Connor <ashe@kivikakk.ee> Reviewed-by: Bin Meng <bmeng.cn@gmail.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210318235041.17175-3-ashe@kivikakk.ee Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	hw/riscv: Add fw_cfg support to virt	Asherah Connor
	Provides fw_cfg for the virt machine on riscv. This enables using e.g. ramfb later. Signed-off-by: Asherah Connor <ashe@kivikakk.ee> Reviewed-by: Bin Meng <bmeng.cn@gmail.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210318235041.17175-2-ashe@kivikakk.ee Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	target/riscv: Use background registers also for MSTATUS_MPV	Georg Kotheimer
	The current condition for the use of background registers only considers the hypervisor load and store instructions, but not accesses from M mode via MSTATUS_MPRV+MPV. Signed-off-by: Georg Kotheimer <georg.kotheimer@kernkonzept.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210311103036.1401073-1-georg.kotheimer@kernkonzept.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	target/riscv: Make VSTIP and VSEIP read-only in hip	Georg Kotheimer
	Signed-off-by: Georg Kotheimer <georg.kotheimer@kernkonzept.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210311094902.1377593-1-georg.kotheimer@kernkonzept.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	target/riscv: Adjust privilege level for HLV(X)/HSV instructions	Georg Kotheimer
	According to the specification the "field SPVP of hstatus controls the privilege level of the access" for the hypervisor virtual-machine load and store instructions HLV, HLVX and HSV. Signed-off-by: Georg Kotheimer <georg.kotheimer@kernkonzept.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210311103005.1400718-1-georg.kotheimer@kernkonzept.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	target/riscv: flush TLB pages if PMP permission has been changed	Jim Shu
	If PMP permission of any address has been changed by updating PMP entry, flush all TLB pages to prevent from getting old permission. Signed-off-by: Jim Shu <cwshu@andestech.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 1613916082-19528-4-git-send-email-cwshu@andestech.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	target/riscv: add log of PMP permission checking	Jim Shu
	Like MMU translation, add qemu log of PMP permission checking for debugging. Signed-off-by: Jim Shu <cwshu@andestech.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 1613916082-19528-3-git-send-email-cwshu@andestech.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	target/riscv: propagate PMP permission to TLB page	Jim Shu
	Currently, PMP permission checking of TLB page is bypassed if TLB hits Fix it by propagating PMP permission to TLB page permission. PMP permission checking also use MMU-style API to change TLB permission and size. Signed-off-by: Jim Shu <cwshu@andestech.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 1613916082-19528-2-git-send-email-cwshu@andestech.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	hw/char: disable ibex uart receive if the buffer is full	Alexander Wagner
	Not disabling the UART leads to QEMU overwriting the UART receive buffer with the newest received byte. The rx_level variable is added to allow the use of the existing OpenTitan driver libraries. Signed-off-by: Alexander Wagner <alexander.wagner@ulal.de> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210309152130.13038-1-alexander.wagner@ulal.de Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	target/riscv: fix vs() to return proper error code	Frank Chang
	vs() should return -RISCV_EXCP_ILLEGAL_INST instead of -1 if rvv feature is not enabled. If -1 is returned, exception will be raised and cs->exception_index will be set to the negative return value. The exception will then be treated as an instruction access fault instead of illegal instruction fault. Signed-off-by: Frank Chang <frank.chang@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210223065935.20208-1-frank.chang@sifive.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22	acpi: Move setters/getters of oem fields to X86MachineState	Marian Postevca
	The code that sets/gets oem fields is duplicated in both PC and MICROVM variants. This commit moves it to X86MachineState so that all x86 variants can use it and duplication is removed. Signed-off-by: Marian Postevca <posteuca@mutex.one> Message-Id: <20210221001737.24499-2-posteuca@mutex.one> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22	acpi: Set proper maximum size for "etc/acpi/rsdp" blob	David Hildenbrand
	Let's also set a maximum size for "etc/acpi/rsdp", so the maximum size doesn't get implicitly set based on the initial table size. In my experiments, the table size was in the range of 22 bytes, so a single page (== what we used until now) seems to be good enough. Now that we have defined maximum sizes for all currently used table types, let's assert that we catch usage with new tables that need a proper maximum size definition. Also assert that our initial size does not exceed the maximum size; while qemu_ram_alloc_internal() properly asserts that the initial RAMBlock size is <= its maximum size, the result might differ when the host page size is bigger than 4k. Suggested-by: Laszlo Ersek <lersek@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Shannon Zhao <shannon.zhaosl@gmail.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210304105554.121674-5-david@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22	acpi: Move maximum size logic into acpi_add_rom_blob()	David Hildenbrand
	We want to have safety margins for all tables based on the table type. Let's move the maximum size logic into acpi_add_rom_blob() and make it dependent on the table name, so we don't have to replicate for each and every instance that creates such tables. Suggested-by: Laszlo Ersek <lersek@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Shannon Zhao <shannon.zhaosl@gmail.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210304105554.121674-4-david@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22	microvm: Don't open-code "etc/table-loader"	David Hildenbrand
	Let's just reuse ACPI_BUILD_LOADER_FILE. Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Shannon Zhao <shannon.zhaosl@gmail.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210304105554.121674-3-david@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22	acpi: Set proper maximum size for "etc/table-loader" blob	David Hildenbrand
	The resizeable memory region / RAMBlock that is created for the cmd blob has a maximum size of whole host pages (e.g., 4k), because RAMBlocks work on full host pages. In addition, in i386 ACPI code: acpi_align_size(tables->linker->cmd_blob, ACPI_BUILD_ALIGN_SIZE); makes sure to align to multiples of 4k, padding with 0. For example, if our cmd_blob is created with a size of 2k, the maximum size is 4k - we cannot grow beyond that. Growing might be required due to guest action when rebuilding the tables, but also on incoming migration. This automatic generation of the maximum size used to be sufficient, however, there are cases where we cross host pages now when growing at runtime: we exceed the maximum size of the RAMBlock and can crash QEMU when trying to resize the resizeable memory region / RAMBlock: $ build/qemu-system-x86_64 --enable-kvm \ -machine q35,nvdimm=on \ -smp 1 \ -cpu host \ -m size=2G,slots=8,maxmem=4G \ -object memory-backend-file,id=mem0,mem-path=/tmp/nvdimm,size=256M \ -device nvdimm,label-size=131072,memdev=mem0,id=nvdimm0,slot=1 \ -nodefaults \ -device vmgenid \ -device intel-iommu Results in: Unexpected error in qemu_ram_resize() at ../softmmu/physmem.c:1850: qemu-system-x86_64: Size too large: /rom@etc/table-loader: 0x2000 > 0x1000: Invalid argument In this configuration, we consume exactly 4k (32 entries, 128 bytes each) when creating the VM. However, once the guest boots up and maps the MCFG, we also create the MCFG table and end up consuming 2 additional entries (pointer + checksum) -- which is where we try resizing the memory region / RAMBlock, however, the maximum size does not allow for it. Currently, we get the following maximum sizes for our different mutable tables based on behavior of resizeable RAMBlock: hw table max_size ------- --------------------------------------------------------- virt "etc/acpi/tables" ACPI_BUILD_TABLE_MAX_SIZE (0x200000) virt "etc/table-loader" HOST_PAGE_ALIGN(initial_size) virt "etc/acpi/rsdp" HOST_PAGE_ALIGN(initial_size) i386 "etc/acpi/tables" ACPI_BUILD_TABLE_MAX_SIZE (0x200000) i386 "etc/table-loader" HOST_PAGE_ALIGN(initial_size) i386 "etc/acpi/rsdp" HOST_PAGE_ALIGN(initial_size) microvm "etc/acpi/tables" ACPI_BUILD_TABLE_MAX_SIZE (0x200000) microvm "etc/table-loader" HOST_PAGE_ALIGN(initial_size) microvm "etc/acpi/rsdp" HOST_PAGE_ALIGN(initial_size) Let's set the maximum table size for "etc/table-loader" to 64k, so we can properly grow at runtime, which should be good enough for the future. Migration is not concerned with the maximum size of a RAMBlock, only with the used size - so existing setups are not affected. Of course, we cannot migrate a VM that would have crash when started on older QEMU from new QEMU to older QEMU without failing early on the destination when synchronizing the RAM state: qemu-system-x86_64: Size too large: /rom@etc/table-loader: 0x2000 > 0x1000: Invalid argument qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram' qemu-system-x86_64: load of migration failed: Invalid argument We'll refactor the code next, to make sure we get rid of this implicit behavior for "etc/acpi/rsdp" as well and to make the code easier to grasp. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Shannon Zhao <shannon.zhaosl@gmail.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210304105554.121674-2-david@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22	tests: acpi: update expected blobs	Igor Mammedov
	expected changes are: * larger BNMR operation region * new PIDX field and method to fetch acpi-index * PDSM method that implements PCI device _DSM + per device _DSM that calls PDSM @@ -221,10 +221,11 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC ", 0x00000001) B0EJ, 32 } - OperationRegion (BNMR, SystemIO, 0xAE10, 0x04) + OperationRegion (BNMR, SystemIO, 0xAE10, 0x08) Field (BNMR, DWordAcc, NoLock, WriteAsZeros) { - BNUM, 32 + BNUM, 32, + PIDX, 32 } Mutex (BLCK, 0x00) @@ -236,6 +237,52 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC ", 0x00000001) Release (BLCK) Return (Zero) } + + Method (AIDX, 2, NotSerialized) + { + Acquire (BLCK, 0xFFFF) + BNUM = Arg0 + PIDX = (One << Arg1) + Local0 = PIDX /* \_SB_.PCI0.PIDX / + Release (BLCK) + Return (Local0) + } + + Method (PDSM, 6, Serialized) + { + If ((Arg0 == ToUUID ("e5c937d0-3553-4d7a-9117-ea4d19c3434d") / Device Labeling Interface */)) + { + Local0 = AIDX (Arg4, Arg5) + If ((Arg2 == Zero)) + { + If ((Arg1 == 0x02)) + { + If (!((Local0 == Zero) \| (Local0 == 0xFFFFFFFF))) + { + Return (Buffer (One) + { + 0x81 // . + }) + } + } + + Return (Buffer (One) + { + 0x00 // . + }) + } + ElseIf ((Arg2 == 0x07)) + { + Local1 = Package (0x02) + { + Zero, + "" + } + Local1 [Zero] = Local0 + Return (Local1) + } + } + } } Scope (_SB) @@ -785,7 +832,7 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC ", 0x00000001) 0xAE00, // Range Minimum 0xAE00, // Range Maximum 0x01, // Alignment - 0x14, // Length + 0x18, // Length ) }) } @@ -842,11 +889,22 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC ", 0x00000001) Device (S00) { Name (_ADR, Zero) // _ADR: Address + Name (_SUN, Zero) // _SUN: Slot User Number + Method (_DSM, 4, Serialized) // _DSM: Device-Specific Method + { + Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, _SUN)) + } } Device (S10) { Name (_ADR, 0x00020000) // _ADR: Address + Name (_SUN, 0x02) // _SUN: Slot User Number + Method (_DSM, 4, Serialized) // _DSM: Device-Specific Method + { + Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, _SUN)) + } + Method (_S1D, 0, NotSerialized) // _S1D: S1 Device State { Return (Zero) [...] Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-7-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22	pci: acpi: add _DSM method to PCI devices	Igor Mammedov
	Implement _DSM according to: PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems and wire it up to cold and hot-plugged PCI devices. Feature depends on ACPI hotplug being enabled (as that provides PCI devices descriptions in ACPI and MMIO registers that are reused to fetch acpi-index). acpi-index should work for - cold plugged NICs: $QEMU -device e1000,acpi-index=100 => 'eno100' - hot-plugged (monitor) device_add e1000,acpi-index=200,id=remove_me => 'eno200' - re-plugged (monitor) device_del remove_me (monitor) device_add e1000,acpi-index=1 => 'eno1' Windows also sees index under "PCI Label Id" field in properties dialog but otherwise it doesn't seem to have any effect. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-6-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22	acpi: add aml_to_decimalstring() and aml_call6() helpers	Igor Mammedov
	it will be used by follow up patches Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-5-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22	pci: acpi: ensure that acpi-index is unique	Igor Mammedov
	it helps to avoid device naming conflicts when guest OS is configured to use acpi-index for naming. Spec ialso says so: PCI Firmware Specification Revision 3.2 4.6.7. _DSM for Naming a PCI or PCI Express Device Under Operating Systems " Instance number must be unique under \_SB scope. This instance number does not have to be sequential in a given system configuration. " Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-4-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22	pci: introduce acpi-index property for PCI device	Igor Mammedov
	In x86/ACPI world, linux distros are using predictable network interface naming since systemd v197. Which on QEMU based VMs results into path based naming scheme, that names network interfaces based on PCI topology. With itm on has to plug NIC in exactly the same bus/slot, which was used when disk image was first provisioned/configured or one risks to loose network configuration due to NIC being renamed to actually used topology. That also restricts freedom to reshape PCI configuration of VM without need to reconfigure used guest image. systemd also offers "onboard" naming scheme which is preferred over PCI slot/topology one, provided that firmware implements: " PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems " that allows to assign user defined index to PCI device, which systemd will use to name NIC. For example, using -device e1000,acpi-index=100 guest will rename NIC to 'eno100', where 'eno' is default prefix for "onboard" naming scheme. This doesn't require any advance configuration on guest side to com in effect at 'onboard' scheme takes priority over path based naming. Hope is that 'acpi-index' it will be easier to consume by management layer, compared to forcing specific PCI topology and/or having several disk image templates for different topologies and will help to simplify process of spawning VM from the same template without need to reconfigure guest NIC. This patch adds, 'acpi-index'* property and wires up a 32bit register on top of pci hotplug register block to pass index value to AML code at runtime. Following patch will add corresponding _DSM code and wire it up to PCI devices described in ACPI. *) name comes from linux kernel terminology Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22	tests: acpi: temporary whitelist DSDT changes	Igor Mammedov
	Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-2-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22	Merge remote-tracking branch 'remotes/philmd/tags/sdmmc-20210322' into staging	Peter Maydell
	SD/MMC patches queue - Fix build error when DEBUG_SD is on - Perform SD ERASE operation - SDHCI ADMA heap buffer overflow (CVE-2020-17380, CVE-2020-25085, CVE-2021-3409) # gpg: Signature made Mon 22 Mar 2021 17:13:43 GMT # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * remotes/philmd/tags/sdmmc-20210322: hw/sd: sdhci: Reset the data pointer of s->fifo_buffer[] when a different block size is programmed hw/sd: sdhci: Limit block size only when SDHC_BLKSIZE register is writable hw/sd: sdhci: Correctly set the controller status for ADMA hw/sd: sdhci: Don't write to SDHC_SYSAD register when transfer is in progress hw/sd: sdhci: Don't transfer any data when command time out hw/sd: sd: Actually perform the erase operation hw/sd: sd: Fix build error when DEBUG_SD is on Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-22	hw/sd: sdhci: Reset the data pointer of s->fifo_buffer[] when a different ↵	Bin Meng
	block size is programmed If the block size is programmed to a different value from the previous one, reset the data pointer of s->fifo_buffer[] so that s->fifo_buffer[] can be filled in using the new block size in the next transfer. With this fix, the following reproducer: outl 0xcf8 0x80001010 outl 0xcfc 0xe0000000 outl 0xcf8 0x80001001 outl 0xcfc 0x06000000 write 0xe000002c 0x1 0x05 write 0xe0000005 0x1 0x02 write 0xe0000007 0x1 0x01 write 0xe0000028 0x1 0x10 write 0x0 0x1 0x23 write 0x2 0x1 0x08 write 0xe000000c 0x1 0x01 write 0xe000000e 0x1 0x20 write 0xe000000f 0x1 0x00 write 0xe000000c 0x1 0x32 write 0xe0000004 0x2 0x0200 write 0xe0000028 0x1 0x00 write 0xe0000003 0x1 0x40 cannot be reproduced with the following QEMU command line: $ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \ -nodefaults -device sdhci-pci,sd-spec-version=3 \ -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \ -device sd-card,drive=mydrive -qtest stdio Cc: qemu-stable@nongnu.org Fixes: CVE-2020-17380 Fixes: CVE-2020-25085 Fixes: CVE-2021-3409 Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller") Reported-by: Alexander Bulekov <alxndr@bu.edu> Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum) Reported-by: Sergej Schumilo (Ruhr-Universität Bochum) Reported-by: Simon Wörner (Ruhr-Universität Bochum) Buglink: https://bugs.launchpad.net/qemu/+bug/1892960 Buglink: https://bugs.launchpad.net/qemu/+bug/1909418 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146 Tested-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-6-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22	hw/sd: sdhci: Limit block size only when SDHC_BLKSIZE register is writable	Bin Meng
	The codes to limit the maximum block size is only necessary when SDHC_BLKSIZE register is writable. Tested-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-5-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22	hw/sd: sdhci: Correctly set the controller status for ADMA	Bin Meng
	When an ADMA transfer is started, the codes forget to set the controller status to indicate a transfer is in progress. With this fix, the following 2 reproducers: https://paste.debian.net/plain/1185136 https://paste.debian.net/plain/1185141 cannot be reproduced with the following QEMU command line: $ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \ -nodefaults -device sdhci-pci,sd-spec-version=3 \ -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \ -device sd-card,drive=mydrive -qtest stdio Cc: qemu-stable@nongnu.org Fixes: CVE-2020-17380 Fixes: CVE-2020-25085 Fixes: CVE-2021-3409 Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller") Reported-by: Alexander Bulekov <alxndr@bu.edu> Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum) Reported-by: Sergej Schumilo (Ruhr-Universität Bochum) Reported-by: Simon Wörner (Ruhr-Universität Bochum) Buglink: https://bugs.launchpad.net/qemu/+bug/1892960 Buglink: https://bugs.launchpad.net/qemu/+bug/1909418 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146 Tested-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-4-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22	hw/sd: sdhci: Don't write to SDHC_SYSAD register when transfer is in progress	Bin Meng
	Per "SD Host Controller Standard Specification Version 7.00" chapter 2.2.1 SDMA System Address Register: This register can be accessed only if no transaction is executing (i.e., after a transaction has stopped). With this fix, the following reproducer: outl 0xcf8 0x80001010 outl 0xcfc 0xfbefff00 outl 0xcf8 0x80001001 outl 0xcfc 0x06000000 write 0xfbefff2c 0x1 0x05 write 0xfbefff0f 0x1 0x37 write 0xfbefff0a 0x1 0x01 write 0xfbefff0f 0x1 0x29 write 0xfbefff0f 0x1 0x02 write 0xfbefff0f 0x1 0x03 write 0xfbefff04 0x1 0x01 write 0xfbefff05 0x1 0x01 write 0xfbefff07 0x1 0x02 write 0xfbefff0c 0x1 0x33 write 0xfbefff0e 0x1 0x20 write 0xfbefff0f 0x1 0x00 write 0xfbefff2a 0x1 0x01 write 0xfbefff0c 0x1 0x00 write 0xfbefff03 0x1 0x00 write 0xfbefff05 0x1 0x00 write 0xfbefff2a 0x1 0x02 write 0xfbefff0c 0x1 0x32 write 0xfbefff01 0x1 0x01 write 0xfbefff02 0x1 0x01 write 0xfbefff03 0x1 0x01 cannot be reproduced with the following QEMU command line: $ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \ -nodefaults -device sdhci-pci,sd-spec-version=3 \ -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \ -device sd-card,drive=mydrive -qtest stdio Cc: qemu-stable@nongnu.org Fixes: CVE-2020-17380 Fixes: CVE-2020-25085 Fixes: CVE-2021-3409 Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller") Reported-by: Alexander Bulekov <alxndr@bu.edu> Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum) Reported-by: Sergej Schumilo (Ruhr-Universität Bochum) Reported-by: Simon Wörner (Ruhr-Universität Bochum) Buglink: https://bugs.launchpad.net/qemu/+bug/1892960 Buglink: https://bugs.launchpad.net/qemu/+bug/1909418 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146 Tested-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-3-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22	hw/sd: sdhci: Don't transfer any data when command time out	Bin Meng
	At the end of sdhci_send_command(), it starts a data transfer if the command register indicates data is associated. But the data transfer should only be initiated when the command execution has succeeded. With this fix, the following reproducer: outl 0xcf8 0x80001810 outl 0xcfc 0xe1068000 outl 0xcf8 0x80001804 outw 0xcfc 0x7 write 0xe106802c 0x1 0x0f write 0xe1068004 0xc 0x2801d10101fffffbff28a384 write 0xe106800c 0x1f 0x9dacbbcad9e8f7061524334251606f7e8d9cabbac9d8e7f60514233241505f write 0xe1068003 0x28 0x80d000251480d000252280d000253080d000253e80d000254c80d000255a80d000256880d0002576 write 0xe1068003 0x1 0xfe cannot be reproduced with the following QEMU command line: $ qemu-system-x86_64 -nographic -M pc-q35-5.0 \ -device sdhci-pci,sd-spec-version=3 \ -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \ -device sd-card,drive=mydrive \ -monitor none -serial none -qtest stdio Cc: qemu-stable@nongnu.org Fixes: CVE-2020-17380 Fixes: CVE-2020-25085 Fixes: CVE-2021-3409 Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller") Reported-by: Alexander Bulekov <alxndr@bu.edu> Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum) Reported-by: Sergej Schumilo (Ruhr-Universität Bochum) Reported-by: Simon Wörner (Ruhr-Universität Bochum) Buglink: https://bugs.launchpad.net/qemu/+bug/1892960 Buglink: https://bugs.launchpad.net/qemu/+bug/1909418 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146 Acked-by: Alistair Francis <alistair.francis@wdc.com> Tested-by: Alexander Bulekov <alxndr@bu.edu> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-2-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22	hw/sd: sd: Actually perform the erase operation	Bin Meng
	At present the sd_erase() does not erase the requested range of card data to 0xFFs. Let's make the erase operation actually happen. Signed-off-by: Bin Meng <bin.meng@windriver.com> Message-Id: <1613811493-58815-1-git-send-email-bmeng.cn@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22	hw/sd: sd: Fix build error when DEBUG_SD is on	Bin Meng
	"qemu-common.h" should be included to provide the forward declaration of qemu_hexdump() when DEBUG_SD is on. Signed-off-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20210228050609.24779-1-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22	Merge remote-tracking branch 'remotes/philmd/tags/mips-fixes-20210322' into ↵	Peter Maydell
	staging MIPS patches queue - Fix array overrun (Coverity CID 1450831) - Deprecate KVM TE (Trap-and-Emul) # gpg: Signature made Mon 22 Mar 2021 14:06:48 GMT # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * remotes/philmd/tags/mips-fixes-20210322: target/mips: Deprecate Trap-and-Emul KVM support target/mips/mxu_translate.c: Fix array overrun for D16MIN/D16MAX Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-22	virtio-pmem: fix virtio_pmem_resp assign problem	Wang Liang
	ret in virtio_pmem_resp is a uint32_t variable, which should be assigned using virtio_stl_p. The kernel side driver does not guarantee virtio_pmem_resp to be initialized to zero in advance, So sometimes the flush operation will fail. Signed-off-by: Wang Liang <wangliangzz@inspur.com> Message-Id: <20210317024145.271212-1-wangliangzz@126.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22	vhost-user: Monitor slave channel in vhost_user_read()	Greg Kurz
	Now that everything is in place, have the nested event loop to monitor the slave channel. The source in the main event loop is destroyed and recreated to ensure any pending even for the slave channel that was previously detected is purged. This guarantees that the main loop wont invoke slave_read() based on an event that was already handled by the nested loop. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-7-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22	vhost-user: Introduce nested event loop in vhost_user_read()	Greg Kurz
	A deadlock condition potentially exists if a vhost-user process needs to request something to QEMU on the slave channel while processing a vhost-user message. This doesn't seem to affect any vhost-user implementation so far, but this is currently biting the upcoming enablement of DAX with virtio-fs. The issue is being observed when the guest does an emergency reboot while a mapping still exits in the DAX window, which is very easy to get with a busy enough workload (e.g. as simulated by blogbench [1]) : - QEMU sends VHOST_USER_GET_VRING_BASE to virtiofsd. - In order to complete the request, virtiofsd then asks QEMU to remove the mapping on the slave channel. All these dialogs are synchronous, hence the deadlock. As pointed out by Stefan Hajnoczi: When QEMU's vhost-user master implementation sends a vhost-user protocol message, vhost_user_read() does a "blocking" read during which slave_fd is not monitored by QEMU. The natural solution for this issue is an event loop. The main event loop cannot be nested though since we have no guarantees that its fd handlers are prepared for re-entrancy. Introduce a new event loop that only monitors the chardev I/O for now in vhost_user_read() and push the actual reading to a one-shot handler. A subsequent patch will teach the loop to monitor and process messages from the slave channel as well. [1] https://github.com/jedisct1/Blogbench Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-6-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>