aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)Author
2021-03-30hw/timer/renesas_tmr: Add default-case asserts in read_tcnt()Peter Maydell
In commit 81b3ddaf8772ec we fixed a use of uninitialized data in read_tcnt(). However this change wasn't enough to placate Coverity, which is not smart enough to see that if we read a 2 bit field and then handle cases 0, 1, 2 and 3 then there cannot be a flow of execution through the switch default. Add explicit default cases which assert that they can't be reached, which should help silence Coverity. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20210319162458.13760-1-peter.maydell@linaro.org
2021-03-30hw/arm/smmuv3: Drop unused CDM_VALID() and is_cd_valid()Zenghui Yu
They were introduced in commit 9bde7f0674fe ("hw/arm/smmuv3: Implement translate callback") but never actually used. Drop them. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Acked-by: Eric Auger <eric.auger@redhat.com> Message-id: 20210325142702.790-1-yuzenghui@huawei.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-30hw/display/xlnx_dp: Free FIFOs adding xlnx_dp_finalize()Philippe Mathieu-Daudé
When building with --enable-sanitizers we get: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x5618479ec7cf in malloc (qemu-system-aarch64+0x233b7cf) #1 0x7f675745f958 in g_malloc (/lib64/libglib-2.0.so.0+0x58958) #2 0x561847c2dcc9 in xlnx_dp_init hw/display/xlnx_dp.c:1259:5 #3 0x56184a5bdab8 in object_init_with_type qom/object.c:375:9 #4 0x56184a5a2bda in object_initialize_with_type qom/object.c:517:5 #5 0x56184a5a24d5 in object_initialize qom/object.c:536:5 #6 0x56184a5a2f6c in object_initialize_child_with_propsv qom/object.c:566:5 #7 0x56184a5a2e60 in object_initialize_child_with_props qom/object.c:549:10 #8 0x56184a5a3a1e in object_initialize_child_internal qom/object.c:603:5 #9 0x5618495aa431 in xlnx_zynqmp_init hw/arm/xlnx-zynqmp.c:273:5 The RX/TX FIFOs are created in xlnx_dp_init(), add xlnx_dp_finalize() to destroy them. Fixes: 58ac482a66d ("introduce xlnx-dp") Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210323182958.277654-1-f4bug@amsat.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-30net/npcm7xx_emc.c: Fix handling of receiving packets when RSDR not setDoug Evans
Turning REG_MCMDR_RXON is enough to start receiving packets. Signed-off-by: Doug Evans <dje@google.com> Message-id: 20210319195044.741821-1-dje@google.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-29hw/block/nvme: fix ref counting in nvme_format_nsKlaus Jensen
Max noticed that since blk_aio_pwrite_zeroes() may invoke the callback before returning, the callbacks will never see *count == 0 and thus never free the count variable or decrement num_formats causing a CQE to never be posted. Coverity (CID 1451082) also picked up on the fact that count would not be free'ed if the namespace was of zero size. Fix both of these issues by explicitly checking *count and finalize for the given namespace if --(*count) is zero. Enqueing a CQE if there are no AIOs outstanding after this case is already handled by nvme_format() by inspecting *num_formats. Reported-by: Max Reitz <mreitz@redhat.com> Reported-by: Coverity (CID 1451082) Fixes: dc04d25e2f3f ("hw/block/nvme: add support for the format nvm command") Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
2021-03-29hw/block/nvme: fix resource leak in nvme_dif_rwKlaus Jensen
If nvme_map_dptr() fails, nvme_dif_rw() will leak the bounce context. Fix this by using the same error handling as everywhere else in the function. Reported-by: Coverity (CID 1451080) Fixes: 146f720c5563 ("hw/block/nvme: end-to-end data protection") Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
2021-03-26hw/usb/hcd-ehci: Fix crash when showing help of EHCI devicesThomas Huth
QEMU crashes with certain targets when trying to show the help output of EHCI devices: $ ./qemu-system-aarch64 -device ich9-usb-ehci1,help qemu-system-aarch64: ../../devel/qemu/softmmu/physmem.c:1154: phys_section_add: Assertion `map->sections_nb < TARGET_PAGE_SIZE' failed. Aborted (core dumped) This happens because the device is doing things at "instance_init" time that should be done at "realize" time instead. So move the related code to the realize() function instead. (NB: This now also matches the memory_region_del_subregion() calls which are done in usb_ehci_unrealize(), and not during finalize()). Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210326095155.1994604-1-thuth@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2021-03-26s390x: modularize virtio-gpu-ccwGerd Hoffmann
Since the virtio-gpu-ccw device depends on the hw-display-virtio-gpu module, which provides the type virtio-gpu-device, packaging the hw-display-virtio-gpu module as a separate package that may or may not be installed along with the qemu package leads to problems. Namely if the hw-display-virtio-gpu is absent, qemu continues to advertise virtio-gpu-ccw, but it aborts not only when one attempts using virtio-gpu-ccw, but also when libvirtd's capability probing tries to instantiate the type to introspect it. Let us thus introduce a module named hw-s390x-virtio-gpu-ccw that is going to provide the virtio-gpu-ccw device. The hw-s390x prefix was chosen because it is not a portable device. With virtio-gpu-ccw built as a module, the correct way to package a modularized qemu is to require that hw-display-virtio-gpu must be installed whenever the module hw-s390x-virtio-gpu-ccw. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Halil Pasic <pasic@linux.ibm.com> Message-Id: <20210317095622.2839895-4-kraxel@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2021-03-26s390x: add have_virtio_ccwGerd Hoffmann
Introduce a symbol which can be used to prevent ccw modules being loaded into system emulators without ccw support. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Halil Pasic <pasic@linux.ibm.com> Message-Id: <20210317095622.2839895-3-kraxel@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2021-03-26hw/usb/hcd-ehci-sysbus: Free USBPacket on instance finalize()Philippe Mathieu-Daudé
When building with --enable-sanitizers we get: Direct leak of 32 byte(s) in 2 object(s) allocated from: #0 0x5618479ec7cf in malloc (qemu-system-aarch64+0x233b7cf) #1 0x7f675745f958 in g_malloc (/lib64/libglib-2.0.so.0+0x58958) #2 0x561847f02ca2 in usb_packet_init hw/usb/core.c:531:5 #3 0x561848df4df4 in usb_ehci_init hw/usb/hcd-ehci.c:2575:5 #4 0x561847c119ac in ehci_sysbus_init hw/usb/hcd-ehci-sysbus.c:73:5 #5 0x56184a5bdab8 in object_init_with_type qom/object.c:375:9 #6 0x56184a5bd955 in object_init_with_type qom/object.c:371:9 #7 0x56184a5a2bda in object_initialize_with_type qom/object.c:517:5 #8 0x56184a5a24d5 in object_initialize qom/object.c:536:5 #9 0x56184a5a2f6c in object_initialize_child_with_propsv qom/object.c:566:5 #10 0x56184a5a2e60 in object_initialize_child_with_props qom/object.c:549:10 #11 0x56184a5a3a1e in object_initialize_child_internal qom/object.c:603:5 #12 0x561849542d18 in npcm7xx_init hw/arm/npcm7xx.c:427:5 Similarly to commit d710e1e7bd3 ("usb: ehci: fix memory leak in ehci"), fix by calling usb_ehci_finalize() to free the USBPacket. Fixes: 7341ea075c0 Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210323183701.281152-1-f4bug@amsat.org> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2021-03-26usb: Remove "-usbdevice ccid"Thomas Huth
"-usbdevice ccid" was not documented and -usbdevice itself was marked as deprecated before QEMU v6.0. And searching for "-usbdevice ccid" in the internet does not show any useful results, so likely nobody was using the ccid device via the -usbdevice option. Remove it now. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210311092829.1479051-1-thuth@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2021-03-23Merge remote-tracking branch 'remotes/kraxel/tags/ui-20210323-pull-request' ↵Peter Maydell
into staging fixes for 6.0 # gpg: Signature made Tue 23 Mar 2021 15:36:06 GMT # gpg: using RSA key A0328CFFB93A17A79901FE7D4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full] # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" [full] # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full] # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/ui-20210323-pull-request: edid: prefer standard timings include/ui/console.h: Delete is_surface_bgr() qmp: add new qmp display-reload vnc: support reload x509 certificates for vnc crypto: add reload for QCryptoTLSCredsClass Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-23Merge remote-tracking branch ↵Peter Maydell
'remotes/pmaydell/tags/pull-target-arm-20210323' into staging target-arm queue: * hw/arm/virt: Disable pl011 clock migration if needed * target/arm: Make M-profile VTOR loads on reset handle memory aliasing * target/arm: Set ARMMMUFaultInfo.level in user-only arm_cpu_tlb_fill # gpg: Signature made Tue 23 Mar 2021 14:26:09 GMT # gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE # gpg: issuer "peter.maydell@linaro.org" # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate] # gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate] # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate] # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * remotes/pmaydell/tags/pull-target-arm-20210323: target/arm: Set ARMMMUFaultInfo.level in user-only arm_cpu_tlb_fill target/arm: Make M-profile VTOR loads on reset handle memory aliasing hw/core/loader: Add new function rom_ptr_for_as() memory: Add offset_in_region to flatview_cb arguments memory: Document flatview_for_each_range() memory: Make flatview_cb return bool, not int hw/arm/virt: Disable pl011 clock migration if needed Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-23Merge remote-tracking branch 'remotes/aperard/tags/pull-xen-20210323' into ↵Peter Maydell
staging Xen patch - Fix Xen backend block detach via xenstore. # gpg: Signature made Tue 23 Mar 2021 11:53:08 GMT # gpg: using RSA key F80C006308E22CFD8A92E7980CF5572FD7FB55AF # gpg: Good signature from "Anthony PERARD <anthony.perard@gmail.com>" [marginal] # gpg: aka "Anthony PERARD <anthony.perard@citrix.com>" [marginal] # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 5379 2F71 024C 600F 778A 7161 D8D5 7199 DF83 42C8 # Subkey fingerprint: F80C 0063 08E2 2CFD 8A92 E798 0CF5 572F D7FB 55AF * remotes/aperard/tags/pull-xen-20210323: xen-block: Fix removal of backend instance via xenstore Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-23Merge remote-tracking branch ↵Peter Maydell
'remotes/alistair/tags/pull-riscv-to-apply-20210322-2' into staging RISC-V PR for 6.0 This PR includes: - Fix for vector CSR access - Improvements to the Ibex UART device - PMP improvements and bug fixes - Hypervisor extension bug fixes - ramfb support for the virt machine - Fast read support for SST flash - Improvements to the microchip_pfsoc machine # gpg: Signature made Tue 23 Mar 2021 01:56:53 GMT # gpg: using RSA key F6C4AC46D4934868D3B8CE8F21E10D29DF977054 # gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [full] # Primary key fingerprint: F6C4 AC46 D493 4868 D3B8 CE8F 21E1 0D29 DF97 7054 * remotes/alistair/tags/pull-riscv-to-apply-20210322-2: target/riscv: Prevent lost illegal instruction exceptions docs/system: riscv: Add documentation for 'microchip-icicle-kit' machine hw/riscv: microchip_pfsoc: Map EMMC/SD mux register hw/block: m25p80: Support fast read for SST flashes target/riscv: Add proper two-stage lookup exception detection target/riscv: Fix read and write accesses to vsip and vsie hw/riscv: allow ramfb on virt hw/riscv: Add fw_cfg support to virt target/riscv: Use background registers also for MSTATUS_MPV target/riscv: Make VSTIP and VSEIP read-only in hip target/riscv: Adjust privilege level for HLV(X)/HSV instructions target/riscv: flush TLB pages if PMP permission has been changed target/riscv: add log of PMP permission checking target/riscv: propagate PMP permission to TLB page hw/char: disable ibex uart receive if the buffer is full target/riscv: fix vs() to return proper error code Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-23hw/core/loader: Add new function rom_ptr_for_as()Peter Maydell
For accesses to rom blob data before or during reset, we have a function rom_ptr() which looks for a rom blob that would be loaded to the specified address, and returns a pointer into the rom blob data corresponding to that address. This allows board or CPU code to say "what is the data that is going to be loaded to this address?". However, this function does not take account of memory region aliases. If for instance a machine model has RAM at address 0x0000_0000 which is aliased to also appear at 0x1000_0000, a rom_ptr() query for address 0x0000_0000 will only return a match if the guest image provided by the user was loaded at 0x0000_0000 and not if it was loaded at 0x1000_0000, even though they are the same RAM and a run-time guest CPU read of 0x0000_0000 will read the data loaded to 0x1000_0000. Provide a new function rom_ptr_for_as() which takes an AddressSpace argument, so that it can check whether the MemoryRegion corresponding to the address is also mapped anywhere else in the AddressSpace and look for rom blobs that loaded to that alias. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210318174823.18066-5-peter.maydell@linaro.org
2021-03-23hw/arm/virt: Disable pl011 clock migration if neededGavin Shan
A clock is added by commit aac63e0e6ea3 ("hw/char/pl011: add a clock input") since v5.2.0 which corresponds to virt-5.2 machine type. It causes backwards migration failure from upstream to downstream (v5.1.0) when the machine type is specified with virt-5.1. This fixes the issue by following instructions from section "Connecting subsections to properties" in docs/devel/migration.rst. With this applied, the PL011 clock is migrated based on the machine type. virt-5.2 or newer: migration virt-5.1 or older: non-migration Cc: qemu-stable@nongnu.org # v5.2.0+ Fixes: aac63e0e6ea3 ("hw/char/pl011: add a clock input") Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-id: 20210318023801.18287-1-gshan@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-23edid: prefer standard timingsGerd Hoffmann
Windows guests using the "Basic Display Adapter" don't parse the "Established timings III" block. They also don't parse any edid extension. So prefer the "Standard Timings" block to store the display resolutions in edid_fill_modes(). Also reorder the mode list, so more exotic resolutions (specifically the ones which are not supported by vgabios) are moved down and the remaining ones have a better chance to get one of the eight slots in the "Standard Timings" block. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Message-Id: <20210316143812.2363588-6-kraxel@redhat.com>
2021-03-23xen-block: Fix removal of backend instance via xenstoreAnthony PERARD
Whenever a Xen block device is detach via xenstore, the image associated with it remained open by the backend QEMU and an error is logged: qemu-system-i386: failed to destroy drive: Node xvdz-qcow2 is in use This happened since object_unparent() doesn't immediately frees the object and thus keep a reference to the node we are trying to free. The reference is hold by the "drive" property and the call xen_block_drive_destroy() fails. In order to fix that, we call drain_call_rcu() to run the callback setup by bus_remove_child() via object_unparent(). Fixes: 2d24a6466154 ("device-core: use RCU for list of children of a bus") Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Message-Id: <20210308143232.83388-1-anthony.perard@citrix.com>
2021-03-23Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
pc,virtio,pci: fixes, features Fixes all over the place. ACPI index support. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Mon 22 Mar 2021 22:58:45 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: acpi: Move setters/getters of oem fields to X86MachineState acpi: Set proper maximum size for "etc/acpi/rsdp" blob acpi: Move maximum size logic into acpi_add_rom_blob() microvm: Don't open-code "etc/table-loader" acpi: Set proper maximum size for "etc/table-loader" blob tests: acpi: update expected blobs pci: acpi: add _DSM method to PCI devices acpi: add aml_to_decimalstring() and aml_call6() helpers pci: acpi: ensure that acpi-index is unique pci: introduce acpi-index property for PCI device tests: acpi: temporary whitelist DSDT changes virtio-pmem: fix virtio_pmem_resp assign problem vhost-user: Monitor slave channel in vhost_user_read() vhost-user: Introduce nested event loop in vhost_user_read() vhost-user: Convert slave channel to QIOChannelSocket vhost-user: Factor out duplicated slave_fd teardown code vhost-user: Fix double-close on slave_read() error path vhost-user: Drop misleading EAGAIN checks in slave_read() virtio: Fix virtio_mmio_read()/virtio_mmio_write() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-22hw/riscv: microchip_pfsoc: Map EMMC/SD mux registerBin Meng
Since HSS commit c20a89f8dcac, the Icicle Kit reference design has been updated to use a register mapped at 0x4f000000 instead of a GPIO to control whether eMMC or SD card is to be used. With this support the same HSS image can be used for both eMMC and SD card boot flow, while previously two different board configurations were used. This is undocumented but one can take a look at the HSS code HSS_MMCInit() in services/mmc/mmc_api.c. With this commit, HSS image built from 2020.12 release boots again. Signed-off-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210322075248.136255-1-bmeng.cn@gmail.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22hw/block: m25p80: Support fast read for SST flashesBin Meng
Per SST25VF016B datasheet [1], SST flash requires a dummy byte after the address bytes. Note only SPI mode is supported by SST flashes. [1] http://ww1.microchip.com/downloads/en/devicedoc/s71271_04.pdf Signed-off-by: Bin Meng <bin.meng@windriver.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210306060152.7250-1-bmeng.cn@gmail.com Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22hw/riscv: allow ramfb on virtAsherah Connor
Allow ramfb on virt. This lets `-device ramfb' work. Signed-off-by: Asherah Connor <ashe@kivikakk.ee> Reviewed-by: Bin Meng <bmeng.cn@gmail.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210318235041.17175-3-ashe@kivikakk.ee Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22hw/riscv: Add fw_cfg support to virtAsherah Connor
Provides fw_cfg for the virt machine on riscv. This enables using e.g. ramfb later. Signed-off-by: Asherah Connor <ashe@kivikakk.ee> Reviewed-by: Bin Meng <bmeng.cn@gmail.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210318235041.17175-2-ashe@kivikakk.ee Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22hw/char: disable ibex uart receive if the buffer is fullAlexander Wagner
Not disabling the UART leads to QEMU overwriting the UART receive buffer with the newest received byte. The rx_level variable is added to allow the use of the existing OpenTitan driver libraries. Signed-off-by: Alexander Wagner <alexander.wagner@ulal.de> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20210309152130.13038-1-alexander.wagner@ulal.de Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-03-22acpi: Move setters/getters of oem fields to X86MachineStateMarian Postevca
The code that sets/gets oem fields is duplicated in both PC and MICROVM variants. This commit moves it to X86MachineState so that all x86 variants can use it and duplication is removed. Signed-off-by: Marian Postevca <posteuca@mutex.one> Message-Id: <20210221001737.24499-2-posteuca@mutex.one> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22acpi: Set proper maximum size for "etc/acpi/rsdp" blobDavid Hildenbrand
Let's also set a maximum size for "etc/acpi/rsdp", so the maximum size doesn't get implicitly set based on the initial table size. In my experiments, the table size was in the range of 22 bytes, so a single page (== what we used until now) seems to be good enough. Now that we have defined maximum sizes for all currently used table types, let's assert that we catch usage with new tables that need a proper maximum size definition. Also assert that our initial size does not exceed the maximum size; while qemu_ram_alloc_internal() properly asserts that the initial RAMBlock size is <= its maximum size, the result might differ when the host page size is bigger than 4k. Suggested-by: Laszlo Ersek <lersek@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Shannon Zhao <shannon.zhaosl@gmail.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210304105554.121674-5-david@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22acpi: Move maximum size logic into acpi_add_rom_blob()David Hildenbrand
We want to have safety margins for all tables based on the table type. Let's move the maximum size logic into acpi_add_rom_blob() and make it dependent on the table name, so we don't have to replicate for each and every instance that creates such tables. Suggested-by: Laszlo Ersek <lersek@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Shannon Zhao <shannon.zhaosl@gmail.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210304105554.121674-4-david@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22microvm: Don't open-code "etc/table-loader"David Hildenbrand
Let's just reuse ACPI_BUILD_LOADER_FILE. Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Shannon Zhao <shannon.zhaosl@gmail.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210304105554.121674-3-david@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22acpi: Set proper maximum size for "etc/table-loader" blobDavid Hildenbrand
The resizeable memory region / RAMBlock that is created for the cmd blob has a maximum size of whole host pages (e.g., 4k), because RAMBlocks work on full host pages. In addition, in i386 ACPI code: acpi_align_size(tables->linker->cmd_blob, ACPI_BUILD_ALIGN_SIZE); makes sure to align to multiples of 4k, padding with 0. For example, if our cmd_blob is created with a size of 2k, the maximum size is 4k - we cannot grow beyond that. Growing might be required due to guest action when rebuilding the tables, but also on incoming migration. This automatic generation of the maximum size used to be sufficient, however, there are cases where we cross host pages now when growing at runtime: we exceed the maximum size of the RAMBlock and can crash QEMU when trying to resize the resizeable memory region / RAMBlock: $ build/qemu-system-x86_64 --enable-kvm \ -machine q35,nvdimm=on \ -smp 1 \ -cpu host \ -m size=2G,slots=8,maxmem=4G \ -object memory-backend-file,id=mem0,mem-path=/tmp/nvdimm,size=256M \ -device nvdimm,label-size=131072,memdev=mem0,id=nvdimm0,slot=1 \ -nodefaults \ -device vmgenid \ -device intel-iommu Results in: Unexpected error in qemu_ram_resize() at ../softmmu/physmem.c:1850: qemu-system-x86_64: Size too large: /rom@etc/table-loader: 0x2000 > 0x1000: Invalid argument In this configuration, we consume exactly 4k (32 entries, 128 bytes each) when creating the VM. However, once the guest boots up and maps the MCFG, we also create the MCFG table and end up consuming 2 additional entries (pointer + checksum) -- which is where we try resizing the memory region / RAMBlock, however, the maximum size does not allow for it. Currently, we get the following maximum sizes for our different mutable tables based on behavior of resizeable RAMBlock: hw table max_size ------- --------------------------------------------------------- virt "etc/acpi/tables" ACPI_BUILD_TABLE_MAX_SIZE (0x200000) virt "etc/table-loader" HOST_PAGE_ALIGN(initial_size) virt "etc/acpi/rsdp" HOST_PAGE_ALIGN(initial_size) i386 "etc/acpi/tables" ACPI_BUILD_TABLE_MAX_SIZE (0x200000) i386 "etc/table-loader" HOST_PAGE_ALIGN(initial_size) i386 "etc/acpi/rsdp" HOST_PAGE_ALIGN(initial_size) microvm "etc/acpi/tables" ACPI_BUILD_TABLE_MAX_SIZE (0x200000) microvm "etc/table-loader" HOST_PAGE_ALIGN(initial_size) microvm "etc/acpi/rsdp" HOST_PAGE_ALIGN(initial_size) Let's set the maximum table size for "etc/table-loader" to 64k, so we can properly grow at runtime, which should be good enough for the future. Migration is not concerned with the maximum size of a RAMBlock, only with the used size - so existing setups are not affected. Of course, we cannot migrate a VM that would have crash when started on older QEMU from new QEMU to older QEMU without failing early on the destination when synchronizing the RAM state: qemu-system-x86_64: Size too large: /rom@etc/table-loader: 0x2000 > 0x1000: Invalid argument qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram' qemu-system-x86_64: load of migration failed: Invalid argument We'll refactor the code next, to make sure we get rid of this implicit behavior for "etc/acpi/rsdp" as well and to make the code easier to grasp. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Shannon Zhao <shannon.zhaosl@gmail.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210304105554.121674-2-david@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22pci: acpi: add _DSM method to PCI devicesIgor Mammedov
Implement _DSM according to: PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems and wire it up to cold and hot-plugged PCI devices. Feature depends on ACPI hotplug being enabled (as that provides PCI devices descriptions in ACPI and MMIO registers that are reused to fetch acpi-index). acpi-index should work for - cold plugged NICs: $QEMU -device e1000,acpi-index=100 => 'eno100' - hot-plugged (monitor) device_add e1000,acpi-index=200,id=remove_me => 'eno200' - re-plugged (monitor) device_del remove_me (monitor) device_add e1000,acpi-index=1 => 'eno1' Windows also sees index under "PCI Label Id" field in properties dialog but otherwise it doesn't seem to have any effect. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-6-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22acpi: add aml_to_decimalstring() and aml_call6() helpersIgor Mammedov
it will be used by follow up patches Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-5-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22pci: acpi: ensure that acpi-index is uniqueIgor Mammedov
it helps to avoid device naming conflicts when guest OS is configured to use acpi-index for naming. Spec ialso says so: PCI Firmware Specification Revision 3.2 4.6.7. _DSM for Naming a PCI or PCI Express Device Under Operating Systems " Instance number must be unique under \_SB scope. This instance number does not have to be sequential in a given system configuration. " Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-4-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22pci: introduce acpi-index property for PCI deviceIgor Mammedov
In x86/ACPI world, linux distros are using predictable network interface naming since systemd v197. Which on QEMU based VMs results into path based naming scheme, that names network interfaces based on PCI topology. With itm on has to plug NIC in exactly the same bus/slot, which was used when disk image was first provisioned/configured or one risks to loose network configuration due to NIC being renamed to actually used topology. That also restricts freedom to reshape PCI configuration of VM without need to reconfigure used guest image. systemd also offers "onboard" naming scheme which is preferred over PCI slot/topology one, provided that firmware implements: " PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems " that allows to assign user defined index to PCI device, which systemd will use to name NIC. For example, using -device e1000,acpi-index=100 guest will rename NIC to 'eno100', where 'eno' is default prefix for "onboard" naming scheme. This doesn't require any advance configuration on guest side to com in effect at 'onboard' scheme takes priority over path based naming. Hope is that 'acpi-index' it will be easier to consume by management layer, compared to forcing specific PCI topology and/or having several disk image templates for different topologies and will help to simplify process of spawning VM from the same template without need to reconfigure guest NIC. This patch adds, 'acpi-index'* property and wires up a 32bit register on top of pci hotplug register block to pass index value to AML code at runtime. Following patch will add corresponding _DSM code and wire it up to PCI devices described in ACPI. *) name comes from linux kernel terminology Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22hw/sd: sdhci: Reset the data pointer of s->fifo_buffer[] when a different ↵Bin Meng
block size is programmed If the block size is programmed to a different value from the previous one, reset the data pointer of s->fifo_buffer[] so that s->fifo_buffer[] can be filled in using the new block size in the next transfer. With this fix, the following reproducer: outl 0xcf8 0x80001010 outl 0xcfc 0xe0000000 outl 0xcf8 0x80001001 outl 0xcfc 0x06000000 write 0xe000002c 0x1 0x05 write 0xe0000005 0x1 0x02 write 0xe0000007 0x1 0x01 write 0xe0000028 0x1 0x10 write 0x0 0x1 0x23 write 0x2 0x1 0x08 write 0xe000000c 0x1 0x01 write 0xe000000e 0x1 0x20 write 0xe000000f 0x1 0x00 write 0xe000000c 0x1 0x32 write 0xe0000004 0x2 0x0200 write 0xe0000028 0x1 0x00 write 0xe0000003 0x1 0x40 cannot be reproduced with the following QEMU command line: $ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \ -nodefaults -device sdhci-pci,sd-spec-version=3 \ -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \ -device sd-card,drive=mydrive -qtest stdio Cc: qemu-stable@nongnu.org Fixes: CVE-2020-17380 Fixes: CVE-2020-25085 Fixes: CVE-2021-3409 Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller") Reported-by: Alexander Bulekov <alxndr@bu.edu> Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum) Reported-by: Sergej Schumilo (Ruhr-Universität Bochum) Reported-by: Simon Wörner (Ruhr-Universität Bochum) Buglink: https://bugs.launchpad.net/qemu/+bug/1892960 Buglink: https://bugs.launchpad.net/qemu/+bug/1909418 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146 Tested-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-6-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22hw/sd: sdhci: Limit block size only when SDHC_BLKSIZE register is writableBin Meng
The codes to limit the maximum block size is only necessary when SDHC_BLKSIZE register is writable. Tested-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-5-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22hw/sd: sdhci: Correctly set the controller status for ADMABin Meng
When an ADMA transfer is started, the codes forget to set the controller status to indicate a transfer is in progress. With this fix, the following 2 reproducers: https://paste.debian.net/plain/1185136 https://paste.debian.net/plain/1185141 cannot be reproduced with the following QEMU command line: $ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \ -nodefaults -device sdhci-pci,sd-spec-version=3 \ -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \ -device sd-card,drive=mydrive -qtest stdio Cc: qemu-stable@nongnu.org Fixes: CVE-2020-17380 Fixes: CVE-2020-25085 Fixes: CVE-2021-3409 Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller") Reported-by: Alexander Bulekov <alxndr@bu.edu> Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum) Reported-by: Sergej Schumilo (Ruhr-Universität Bochum) Reported-by: Simon Wörner (Ruhr-Universität Bochum) Buglink: https://bugs.launchpad.net/qemu/+bug/1892960 Buglink: https://bugs.launchpad.net/qemu/+bug/1909418 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146 Tested-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-4-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22hw/sd: sdhci: Don't write to SDHC_SYSAD register when transfer is in progressBin Meng
Per "SD Host Controller Standard Specification Version 7.00" chapter 2.2.1 SDMA System Address Register: This register can be accessed only if no transaction is executing (i.e., after a transaction has stopped). With this fix, the following reproducer: outl 0xcf8 0x80001010 outl 0xcfc 0xfbefff00 outl 0xcf8 0x80001001 outl 0xcfc 0x06000000 write 0xfbefff2c 0x1 0x05 write 0xfbefff0f 0x1 0x37 write 0xfbefff0a 0x1 0x01 write 0xfbefff0f 0x1 0x29 write 0xfbefff0f 0x1 0x02 write 0xfbefff0f 0x1 0x03 write 0xfbefff04 0x1 0x01 write 0xfbefff05 0x1 0x01 write 0xfbefff07 0x1 0x02 write 0xfbefff0c 0x1 0x33 write 0xfbefff0e 0x1 0x20 write 0xfbefff0f 0x1 0x00 write 0xfbefff2a 0x1 0x01 write 0xfbefff0c 0x1 0x00 write 0xfbefff03 0x1 0x00 write 0xfbefff05 0x1 0x00 write 0xfbefff2a 0x1 0x02 write 0xfbefff0c 0x1 0x32 write 0xfbefff01 0x1 0x01 write 0xfbefff02 0x1 0x01 write 0xfbefff03 0x1 0x01 cannot be reproduced with the following QEMU command line: $ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \ -nodefaults -device sdhci-pci,sd-spec-version=3 \ -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \ -device sd-card,drive=mydrive -qtest stdio Cc: qemu-stable@nongnu.org Fixes: CVE-2020-17380 Fixes: CVE-2020-25085 Fixes: CVE-2021-3409 Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller") Reported-by: Alexander Bulekov <alxndr@bu.edu> Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum) Reported-by: Sergej Schumilo (Ruhr-Universität Bochum) Reported-by: Simon Wörner (Ruhr-Universität Bochum) Buglink: https://bugs.launchpad.net/qemu/+bug/1892960 Buglink: https://bugs.launchpad.net/qemu/+bug/1909418 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146 Tested-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-3-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22hw/sd: sdhci: Don't transfer any data when command time outBin Meng
At the end of sdhci_send_command(), it starts a data transfer if the command register indicates data is associated. But the data transfer should only be initiated when the command execution has succeeded. With this fix, the following reproducer: outl 0xcf8 0x80001810 outl 0xcfc 0xe1068000 outl 0xcf8 0x80001804 outw 0xcfc 0x7 write 0xe106802c 0x1 0x0f write 0xe1068004 0xc 0x2801d10101fffffbff28a384 write 0xe106800c 0x1f 0x9dacbbcad9e8f7061524334251606f7e8d9cabbac9d8e7f60514233241505f write 0xe1068003 0x28 0x80d000251480d000252280d000253080d000253e80d000254c80d000255a80d000256880d0002576 write 0xe1068003 0x1 0xfe cannot be reproduced with the following QEMU command line: $ qemu-system-x86_64 -nographic -M pc-q35-5.0 \ -device sdhci-pci,sd-spec-version=3 \ -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \ -device sd-card,drive=mydrive \ -monitor none -serial none -qtest stdio Cc: qemu-stable@nongnu.org Fixes: CVE-2020-17380 Fixes: CVE-2020-25085 Fixes: CVE-2021-3409 Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller") Reported-by: Alexander Bulekov <alxndr@bu.edu> Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum) Reported-by: Sergej Schumilo (Ruhr-Universität Bochum) Reported-by: Simon Wörner (Ruhr-Universität Bochum) Buglink: https://bugs.launchpad.net/qemu/+bug/1892960 Buglink: https://bugs.launchpad.net/qemu/+bug/1909418 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146 Acked-by: Alistair Francis <alistair.francis@wdc.com> Tested-by: Alexander Bulekov <alxndr@bu.edu> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-2-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22hw/sd: sd: Actually perform the erase operationBin Meng
At present the sd_erase() does not erase the requested range of card data to 0xFFs. Let's make the erase operation actually happen. Signed-off-by: Bin Meng <bin.meng@windriver.com> Message-Id: <1613811493-58815-1-git-send-email-bmeng.cn@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22hw/sd: sd: Fix build error when DEBUG_SD is onBin Meng
"qemu-common.h" should be included to provide the forward declaration of qemu_hexdump() when DEBUG_SD is on. Signed-off-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20210228050609.24779-1-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22virtio-pmem: fix virtio_pmem_resp assign problemWang Liang
ret in virtio_pmem_resp is a uint32_t variable, which should be assigned using virtio_stl_p. The kernel side driver does not guarantee virtio_pmem_resp to be initialized to zero in advance, So sometimes the flush operation will fail. Signed-off-by: Wang Liang <wangliangzz@inspur.com> Message-Id: <20210317024145.271212-1-wangliangzz@126.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22vhost-user: Monitor slave channel in vhost_user_read()Greg Kurz
Now that everything is in place, have the nested event loop to monitor the slave channel. The source in the main event loop is destroyed and recreated to ensure any pending even for the slave channel that was previously detected is purged. This guarantees that the main loop wont invoke slave_read() based on an event that was already handled by the nested loop. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-7-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22vhost-user: Introduce nested event loop in vhost_user_read()Greg Kurz
A deadlock condition potentially exists if a vhost-user process needs to request something to QEMU on the slave channel while processing a vhost-user message. This doesn't seem to affect any vhost-user implementation so far, but this is currently biting the upcoming enablement of DAX with virtio-fs. The issue is being observed when the guest does an emergency reboot while a mapping still exits in the DAX window, which is very easy to get with a busy enough workload (e.g. as simulated by blogbench [1]) : - QEMU sends VHOST_USER_GET_VRING_BASE to virtiofsd. - In order to complete the request, virtiofsd then asks QEMU to remove the mapping on the slave channel. All these dialogs are synchronous, hence the deadlock. As pointed out by Stefan Hajnoczi: When QEMU's vhost-user master implementation sends a vhost-user protocol message, vhost_user_read() does a "blocking" read during which slave_fd is not monitored by QEMU. The natural solution for this issue is an event loop. The main event loop cannot be nested though since we have no guarantees that its fd handlers are prepared for re-entrancy. Introduce a new event loop that only monitors the chardev I/O for now in vhost_user_read() and push the actual reading to a one-shot handler. A subsequent patch will teach the loop to monitor and process messages from the slave channel as well. [1] https://github.com/jedisct1/Blogbench Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-6-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22vhost-user: Convert slave channel to QIOChannelSocketGreg Kurz
The slave channel is implemented with socketpair() : QEMU creates the pair, passes one of the socket to virtiofsd and monitors the other one with the main event loop using qemu_set_fd_handler(). In order to fix a potential deadlock between QEMU and a vhost-user external process (e.g. virtiofsd with DAX), we want to be able to monitor and service the slave channel while handling vhost-user requests. Prepare ground for this by converting the slave channel to be a QIOChannelSocket. This will make monitoring of the slave channel as simple as calling qio_channel_add_watch_source(). Since the connection is already established between the two sockets, only incoming I/O (G_IO_IN) and disconnect (G_IO_HUP) need to be serviced. This also allows to get rid of the ancillary data parsing since QIOChannelSocket can do this for us. Note that the MSG_CTRUNC check is dropped on the way because QIOChannelSocket ignores this case. This isn't a problem since slave_read() provisions space for 8 file descriptors, but affected vhost-user slave protocol messages generally only convey one. If for some reason a buggy implementation passes more file descriptors, no need to break the connection, just like we don't break it if some other type of ancillary data is received : this isn't explicitely violating the protocol per-se so it seems better to ignore it. The current code errors out on short reads and writes. Use the qio_channel_*_all() variants to address this on the way. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-5-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22vhost-user: Factor out duplicated slave_fd teardown codeGreg Kurz
Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-4-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22vhost-user: Fix double-close on slave_read() error pathGreg Kurz
Some message types, e.g. VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG, can convey file descriptors. These must be closed before returning from slave_read() to avoid being leaked. This can currently be done in two different places: [1] just after the request has been processed [2] on the error path, under the goto label err: These path are supposed to be mutually exclusive but they are not actually. If the VHOST_USER_NEED_REPLY_MASK flag was passed and the sending of the reply fails, both [1] and [2] are performed with the same descriptor values. This can potentially cause subtle bugs if one of the descriptor was recycled by some other thread in the meantime. This code duplication complicates rollback for no real good benefit. Do the closing in a unique place, under a new fdcleanup: goto label at the end of the function. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-3-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22vhost-user: Drop misleading EAGAIN checks in slave_read()Greg Kurz
slave_read() checks EAGAIN when reading or writing to the socket fails. This gives the impression that the slave channel is in non-blocking mode, which is certainly not the case with the current code base. And the rest of the code isn't actually ready to cope with non-blocking I/O. Just drop the checks everywhere in this function for the sake of clarity. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-2-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22virtio: Fix virtio_mmio_read()/virtio_mmio_write()Laurent Vivier
Both functions don't check the personality of the interface (legacy or modern) before accessing the configuration memory and always use virtio_config_readX()/virtio_config_writeX(). With this patch, they now check the personality and in legacy mode call virtio_config_readX()/virtio_config_writeX(), otherwise call virtio_config_modern_readX()/virtio_config_modern_writeX(). This change has been tested with virtio-mmio guests (virt stretch/armhf and virt sid/m68k) and virtio-pci guests (pseries RHEL-7.3/ppc64 and /ppc64le). Signed-off-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20210314200300.3259170-1-laurent@vivier.eu> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22hw/net: virtio-net: Initialize nc->do_not_pad to trueBin Meng
For virtio-net, there is no need to pad the Ethernet frame size to 60 bytes before sending to it. Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com>