aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-07-22tests/qtest/bios-tables-test.c: Remove the fall back pathSunil V L
The expected ACPI AML files are moved now under ${arch}/{machine} path. Hence, there is no need to search in old path which didn't have ${arch}. Remove the code which searches for the expected AML files under old path as well. Suggested-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716144306.2432257-7-sunilvl@ventanamicro.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22tests/acpi: update expected DSDT blob for aarch64 and microvmSunil V L
After PCI link devices are moved out of the scope of PCI root complex, the DSDT files of machines which use GPEX, will change. So, update the expected AML files with these changes for these machines. Mainly, there are 2 changes. 1) Since the link devices are created now directly under _SB for all PCI root bridges in the system, they should have unique names. So, instead of GSIx, named those devices as LXXY where L means link, XX will have PCI bus number and Y will have the INTx number (ex: L000 or L001). The _PRT entries will also be updated to reflect this name change. 2) PCI link devices are moved from the scope of each PCI root bridge to directly under _SB. Below is the sample iASL difference for one such link device. Scope (\_SB) { Name (_HID, "LNRO0005") // _HID: Hardware ID Name (_UID, 0x1F) // _UID: Unique ID Name (_CCA, One) // _CCA: Cache Coherency Attribute Name (_CRS, ResourceTemplate () // _CRS: Current Resource Settings { Memory32Fixed (ReadWrite, 0x0A003E00, // Address Base 0x00000200, // Address Length ) Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, ,, ) { 0x0000004F, } }) + Device (L000) + { + Name (_HID, "PNP0C0F" /* PCI Interrupt Link Device */) + Name (_UID, Zero) // _UID: Unique ID + Name (_PRS, ResourceTemplate () + { + Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, ,, ) + { + 0x00000023, + } + }) + Name (_CRS, ResourceTemplate () + { + Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, ,, ) + { + 0x00000023, + } + }) + Method (_SRS, 1, NotSerialized) // _SRS: Set Resource Settings + { + } + } + Device (PCI0) { Name (_HID, "PNP0A08" /* PCI Express Bus */) // _HID: Hardware ID Name (_CID, "PNP0A03" /* PCI Bus */) // _CID: Compatible ID Name (_SEG, Zero) // _SEG: PCI Segment Name (_BBN, Zero) // _BBN: BIOS Bus Number Name (_UID, Zero) // _UID: Unique ID Name (_STR, Unicode ("PCIe 0 Device")) // _STR: Description String Name (_CCA, One) // _CCA: Cache Coherency Attribute Name (_PRT, Package (0x80) // _PRT: PCI Routing Table { Package (0x04) { 0xFFFF, Zero, - GSI0, + L000, Zero }, ..... }) Device (GSI0) { Name (_HID, "PNP0C0F" /* PCI Interrupt Link Device */) Name (_UID, Zero) // _UID: Unique ID Name (_PRS, ResourceTemplate () { Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, ,, ) { 0x00000023, } }) Name (_CRS, ResourceTemplate () { Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, ,, ) { 0x00000023, } }) Method (_SRS, 1, NotSerialized) // _SRS: Set Resource Settings { } } } } Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Message-Id: <20240716144306.2432257-6-sunilvl@ventanamicro.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22acpi/gpex: Create PCI link devices outside PCI root bridgeSunil V L
Currently, PCI link devices (PNP0C0F) are always created within the scope of the PCI root bridge. However, RISC-V needs these link devices to be created outside to ensure the probing order in the OS. This matches the example given in the ACPI specification [1] as well. Hence, create these link devices directly under _SB instead of under the PCI root bridge. To keep these link device names unique for multiple PCI bridges, change the device name from GSIx to LXXY format where XX is the PCI bus number and Y is the INTx. GPEX is currently used by riscv, aarch64/virt and x86/microvm machines. So, this change will alter the DSDT for those systems. [1] - ACPI 5.1: 6.2.13.1 Example: Using _PRT to Describe PCI IRQ Routing Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716144306.2432257-5-sunilvl@ventanamicro.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22tests/acpi: Allow DSDT acpi table changes for aarch64Sunil V L
so that CI tests don't fail when those ACPI tables are updated in the next patch. This is as per the documentation in bios-tables-tests.c. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716144306.2432257-4-sunilvl@ventanamicro.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/riscv/virt-acpi-build.c: Update the HID of RISC-V UARTSunil V L
The requirement ACPI_060 in the RISC-V BRS specification [1], requires NS16550 compatible UART to have the HID RSCV0003. So, update the HID for the UART. [1] - https://github.com/riscv-non-isa/riscv-brs/releases/download/v0.0.2/riscv-brs-spec.pdf (Chapter 6) Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716144306.2432257-3-sunilvl@ventanamicro.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/riscv/virt-acpi-build.c: Add namespace devices for PLIC and APLICSunil V L
As per the requirement ACPI_080 in the RISC-V Boot and Runtime Services (BRS) specification [1], PLIC and APLIC should be in namespace as well. So, add them using the defined HID. [1] - https://github.com/riscv-non-isa/riscv-brs/releases/download/v0.0.2/riscv-brs-spec.pdf (Chapter 6) Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716144306.2432257-2-sunilvl@ventanamicro.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-iommu: Add trace point on virtio_iommu_detach_endpoint_from_domainEric Auger
Add a trace point on virtio_iommu_detach_endpoint_from_domain(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-7-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/vfio/common: Add vfio_listener_region_del_iommu trace eventEric Auger
Trace when VFIO gets notified about the deletion of an IOMMU MR. Also trace the name of the region in the add_iommu trace message. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-6-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-iommu: Remove the end point on detachEric Auger
We currently miss the removal of the endpoint in case of detach. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-5-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-iommu: Free [host_]resv_ranges on unset_iommu_devicesEric Auger
We are currently missing the deallocation of the [host_]resv_regions in case of hot unplug. Also to make things more simple let's rule out the case where multiple HostIOMMUDevices would be aliased and attached to the same IOMMUDevice. This allows to remove the handling of conflicting Host reserved regions. Anyway this is not properly supported at guest kernel level. On hotunplug the reserved regions are reset to the ones set by virtio-iommu property. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-4-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-iommu: Remove probe_doneEric Auger
Now we have switched to PCIIOMMUOps to convey host IOMMU information, the host reserved regions are transmitted when the PCIe topology is built. This happens way before the virtio-iommu driver calls the probe request. So let's remove the probe_done flag that allowed to check the probe was not done before the IOMMU MR got enabled. Besides this probe_done flag had a flaw wrt migration since it was not saved/restored. The only case at risk is if 2 devices were plugged to a PCIe to PCI bridge and thus aliased. First of all we discovered in the past this case was not properly supported for neither SMMU nor virtio-iommu on guest kernel side: see [RFC] virtio-iommu: Take into account possible aliasing in virtio_iommu_mr() https://lore.kernel.org/all/20230116124709.793084-1-eric.auger@redhat.com/ If this were supported by the guest kernel, it is unclear what the call sequence would be from a virtio-iommu driver point of view. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-3-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22Revert "virtio-iommu: Clear IOMMUDevice when VFIO device is unplugged"Eric Auger
This reverts commit 1b889d6e39c32d709f1114699a014b381bcf1cb1. There are different problems with that tentative fix: - Some resources are left dangling (resv_regions, host_resv_ranges) and memory subregions are left attached to the root MR although freed as embedded in the sdev IOMMUDevice. Finally the sdev->as is not destroyed and associated listeners are left. - Even when fixing the above we observe a memory corruption associated with the deallocation of the IOMMUDevice. This can be observed when a VFIO device is hotplugged, hot-unplugged and a system reset is issued. At this stage we have not been able to identify the root cause (IOMMU MR or as structs beeing overwritten and used later on?). - Another issue is HostIOMMUDevice are indexed by non aliased BDF whereas the IOMMUDevice is indexed by aliased BDF - yes the current naming is really misleading -. Given the state of the code I don't think the virtio-iommu device works in non singleton group case though. So let's revert the patch for now. This means the IOMMU MR/as survive the hotunplug. This is what is done in the intel_iommu for instance. It does not sound very logical to keep those but currently there is no symetric function to pci_device_iommu_address_space(). probe_done issue will be handled in a subsequent patch. Also resv_regions and host_resv_regions will be deallocated separately. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-2-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22gdbstub: Add helper function to unregister GDB register spaceSalil Mehta
Add common function to help unregister the GDB register space. This shall be done in context to the CPU unrealization. Note: These are common functions exported to arch specific code. For example, for ARM this code is being referred in associated arch specific patch-set: Link: https://lore.kernel.org/qemu-devel/20230926103654.34424-1-salil.mehta@huawei.com/ Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-8-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22physmem: Add helper function to destroy CPU AddressSpaceSalil Mehta
Virtual CPU Hot-unplug leads to unrealization of a CPU object. This also involves destruction of the CPU AddressSpace. Add common function to help destroy the CPU AddressSpace. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-7-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/acpi: Update CPUs AML with cpu-(ctrl)dev changeSalil Mehta
CPUs Control device(\\_SB.PCI0) register interface for the x86 arch is IO port based and existing CPUs AML code assumes _CRS objects would evaluate to a system resource which describes IO Port address. But on ARM arch CPUs control device(\\_SB.PRES) register interface is memory-mapped hence _CRS object should evaluate to system resource which describes memory-mapped base address. Update build CPUs AML function to accept both IO/MEMORY region spaces and accordingly update the _CRS object. Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-6-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/acpi: Update GED _EVT method AML with CPU scanSalil Mehta
OSPM evaluates _EVT method to map the event. The CPU hotplug event eventually results in start of the CPU scan. Scan figures out the CPU and the kind of event(plug/unplug) and notifies it back to the guest. Update the GED AML _EVT method with the call to method \\_SB.CPUS.CSCN (via \\_SB.GED.CSCN) Architecture specific code [1] might initialize its CPUs AML code by calling common function build_cpus_aml() like below for ARM: build_cpus_aml(scope, ms, opts, xx_madt_cpu_entry, memmap[VIRT_CPUHP_ACPI].base, "\\_SB", "\\_SB.GED.CSCN", AML_SYSTEM_MEMORY); [1] https://lore.kernel.org/qemu-devel/20240613233639.202896-13-salil.mehta@huawei.com/ Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-5-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/acpi: Update ACPI GED framework to support vCPU HotplugSalil Mehta
ACPI GED (as described in the ACPI 6.4 spec) uses an interrupt listed in the _CRS object of GED to intimate OSPM about an event. Later then demultiplexes the notified event by evaluating ACPI _EVT method to know the type of event. Use ACPI GED to also notify the guest kernel about any CPU hot(un)plug events. Note, GED interface is used by many hotplug events like memory hotplug, NVDIMM hotplug and non-hotplug events like system power down event. Each of these can be selected using a bit in the 32 bit GED IO interface. A bit has been reserved for the CPU hotplug event. ACPI CPU hotplug related initialization should only happen if ACPI_CPU_HOTPLUG support has been enabled for particular architecture. Add cpu_hotplug_hw_init() stub to avoid compilation break. Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20240716111502.202344-4-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Igor Mammedov <imammedo@redhat.com>
2024-07-22hw/acpi: Move CPU ctrl-dev MMIO region len macro to common header fileSalil Mehta
CPU ctrl-dev MMIO region length could be used in ACPI GED and various other architecture specific places. Move ACPI_CPU_HOTPLUG_REG_LEN macro to more appropriate common header file. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-3-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22accel/kvm: Extract common KVM vCPU {creation,parking} codeSalil Mehta
KVM vCPU creation is done once during the vCPU realization when Qemu vCPU thread is spawned. This is common to all the architectures as of now. Hot-unplug of vCPU results in destruction of the vCPU object in QOM but the corresponding KVM vCPU object in the Host KVM is not destroyed as KVM doesn't support vCPU removal. Therefore, its representative KVM vCPU object/context in Qemu is parked. Refactor architecture common logic so that some APIs could be reused by vCPU Hotplug code of some architectures likes ARM, Loongson etc. Update new/old APIs with trace events. New APIs qemu_{create,park,unpark}_vcpu() can be externally called. No functional change is intended here. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-2-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22smbios: make memory device size configurable per MachineIgor Mammedov
Currently QEMU describes initial[1] RAM* in SMBIOS as a series of virtual DIMMs (capped at 16Gb max) using type 17 structure entries. Which is fine for the most cases. However when starting guest with terabytes of RAM this leads to too many memory device structures, which eventually upsets linux kernel as it reserves only 64K for these entries and when that border is crossed out it runs out of reserved memory. Instead of partitioning initial RAM on 16Gb DIMMs, use maximum possible chunk size that SMBIOS spec allows[2]. Which lets encode RAM in lower 31 bits of 32bit field (which amounts upto 2047Tb per DIMM). As result initial RAM will generate only one type 17 structure until host/guest reach ability to use more RAM in the future. Compat changes: We can't unconditionally change chunk size as it will break QEMU<->guest ABI (and migration). Thus introduce a new machine class field that would let older versioned machines to use legacy 16Gb chunks, while new(er) machine type[s] use maximum possible chunk size. PS: While it might seem to be risky to rise max entry size this large (much beyond of what current physical RAM modules support), I'd not expect it causing much issues, modulo uncovering bugs in software running within guest. And those should be fixed on guest side to handle SMBIOS spec properly, especially if guest is expected to support so huge RAM configs. In worst case, QEMU can reduce chunk size later if we would care enough about introducing a workaround for some 'unfixable' guest OS, either by fixing up the next machine type or giving users a CLI option to customize it. 1) Initial RAM - is RAM configured with help '-m SIZE' CLI option/ implicitly defined by machine. It doesn't include memory configured with help of '-device' option[s] (pcdimm,nvdimm,...) 2) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size PS: * tested on 8Tb host with RHEL6 guest, which seems to parse type 17 SMBIOS table entries correctly (according to 'dmidecode'). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240715122417.4059293-1-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22docs: Document composable SR-IOV deviceAkihiko Odaki
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-8-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-net: Implement SR-IOV VFAkihiko Odaki
A virtio-net device can be added as a SR-IOV VF to another virtio-pci device that will be the PF. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-7-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-pci: Implement SR-IOV PFAkihiko Odaki
Allow user to attach SR-IOV VF to a virtio-pci PF. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-6-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22pcie_sriov: Allow user to create SR-IOV deviceAkihiko Odaki
A user can create a SR-IOV device by specifying the PF with the sriov-pf property of the VFs. The VFs must be added before the PF. A user-creatable VF must have PCIDeviceClass::sriov_vf_user_creatable set. Such a VF cannot refer to the PF because it is created before the PF. A PF that user-creatable VFs can be attached calls pcie_sriov_pf_init_from_user_created_vfs() during realization and pcie_sriov_pf_exit() when exiting. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-5-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22pcie_sriov: Check PCI Express for SR-IOV PFAkihiko Odaki
SR-IOV requires PCI Express. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-4-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22pcie_sriov: Ensure PF and VF are mutually exclusiveAkihiko Odaki
A device cannot be a SR-IOV PF and a VF at the same time. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-3-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/pci: Fix SR-IOV VF number calculationAkihiko Odaki
pci_config_get_bar_addr() had a division by vf_stride. vf_stride needs to be non-zero when there are multiple VFs, but the specification does not prohibit to make it zero when there is only one VF. Do not perform the division for the first VF to avoid division by zero. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-2-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-23Merge tag 'pull-request-2024-07-22' of https://gitlab.com/thuth/qemu into ↵Richard Henderson
staging * Minor clean-ups and fixes for the qtests and Avocado tests * Fix crash that happens when introspecting scsi-block on older machine types * s390x: filter deprecated properties based on model expansion type # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmaeSUMRHHRodXRoQHJl # ZGhhdC5jb20ACgkQLtnXdP5wLbVdQw/8DvGymXKwpS0F2aSHg3AZvjSCpkv3Y+fK # myQrzh30cv9Vhe/Y9do47HpfJ6Ug9SK6xG64K2o+BIW+G3+ZUwSHk24PoiALsrJf # 9qqya1upBJkEC5B4PhqRPS3GlbvBnKKEk8W6BMpUa2BToFV9MsG256cBVhUrRpGc # 6u80DgTNxCI1czsNkWVGJAt1oVLYYJIjz7UZ4VbZCH48o6r0iSUV6C01wccOFmNy # IXbspyyUftWFh9lO0i8PiYlXG2YEAmFry3gqD5vc+6BsFT4lMeoRFFxbVCddGKFc # iNwlH4ayjeISlEJeClImIdbHyZ+sDhPyy5x4cpQqmZudEPn+GVnZ0arm7OvXW/k8 # Yog4n7/cUz7GHnWbqYIFZMS1g1wmqm/9VPsVTzXAlTva4dTTs2p0tKAADHIAtPCI # jxSPpbuCuukDzUZGsNZyRGbex6g4B0tP4TMHRFxo5LVy9dKn2BLOHBWuzPevD9OO # FphZHUuGngcPi4GSFmlv7aCS0pqyWsCO+5EqoYUgO8yadyfiXN9pwjB6OnBZux0U # kbJOkkBJwEalhsiHmPFMnS8rkWa4Ye4ZJjj8XHRiecxSZOcNOcxyE+l2x8CV2aFB # UBR83nm86vXXpu86Yod3E+txDEUzKN5+B8X0q7Se0YvsWbB+1Dq/Co0Bdh/Wp70E # EPk5eqaSp8k= # =zB5F # -----END PGP SIGNATURE----- # gpg: Signature made Mon 22 Jul 2024 09:57:55 PM AEST # gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5 # gpg: issuer "thuth@redhat.com" # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full] # gpg: aka "Thomas Huth <thuth@redhat.com>" [full] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full] * tag 'pull-request-2024-07-22' of https://gitlab.com/thuth/qemu: target/s390x: filter deprecated properties based on model expansion type tests: increase timeout per instance of bios-tables-test qtest/fuzz: make range overlap check more readable hw: Fix crash that happens when introspecting scsi-block on older machine types tests/avocado/machine_aspeed.py: Increase timeout for TPM test tests/avocado: Remove the remainders of the virtiofs_submounts test tests/avocado/mem-addr-space-check: Remove unused "import signal" tests/avocado: Move LinuxTest related code into a separate file tests/avocado: Allow overwriting AVOCADO_SHOW env variable tests/avocado/boot_xen.py: use class attribute tests/avocado/boot_xen.py: unify tags tests/avocado/boot_xen.py: merge base classes Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-22chardev/char-win-stdio.c: restore old console modesongziming
If I use `-serial stdio` on Windows, after QEMU exits, the terminal could not handle arrow keys and tab any more. Because stdio backend on Windows sets console mode to virtual terminal input when starts, but does not restore the old mode when finalize. This small patch saves the old console mode and set it back. Signed-off-by: Ziming Song <s.ziming@hotmail.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-ID: <ME3P282MB25488BE7C39BF0C35CD0DA5D8CA82@ME3P282MB2548.AUSP282.PROD.OUTLOOK.COM>
2024-07-22hpet: avoid timer storms on periodic timersPaolo Bonzini
If the period is set to a value that is too low, there could be no time left to run the rest of QEMU. Do not trigger interrupts faster than 1 MHz. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hpet: store full 64-bit target value of the counterPaolo Bonzini
Store the full 64-bit value at which the timer should fire. This makes it possible to skip the imprecise hpet_calculate_diff() step, and to remove the clamping of the period to 31 or 63 bits. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hpet: accept 64-bit reads and writesPaolo Bonzini
Declare the MemoryRegionOps so that 64-bit reads and writes to the HPET are received directly. This makes it possible to unify the code to process low and high parts: for 32-bit reads, extract the desired word; for 32-bit writes, just merge the desired part into the old value and proceed as with a 64-bit write. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hpet: place read-only bits directly in "new_val"Paolo Bonzini
The variable "val" is used for two different purposes. As an intermediate value when writing configuration registers, and to store the cleared bits when writing ISR. Use "new_val" for the former, and rename the variable so that it is clearer for the latter case. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hpet: remove unnecessary variable "index"Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hpet: ignore high bits of comparator in 32-bit modePaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hpet: fix and cleanup persistence of interrupt statusPaolo Bonzini
There are several bugs in the handling of the ISR register: - switching level->edge was not lowering the interrupt and clearing ISR - switching on the enable bit was not raising a level-triggered interrupt if the timer had fired - the timer must be kept running even if not enabled, in order to set the ISR flag, so writes to HPET_TN_CFG must not call hpet_del_timer() Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22Add support for RAPL MSRs in KVM/QemuAnthony Harivel
Starting with the "Sandy Bridge" generation, Intel CPUs provide a RAPL interface (Running Average Power Limit) for advertising the accumulated energy consumption of various power domains (e.g. CPU packages, DRAM, etc.). The consumption is reported via MSRs (model specific registers) like MSR_PKG_ENERGY_STATUS for the CPU package power domain. These MSRs are 64 bits registers that represent the accumulated energy consumption in micro Joules. They are updated by microcode every ~1ms. For now, KVM always returns 0 when the guest requests the value of these MSRs. Use the KVM MSR filtering mechanism to allow QEMU handle these MSRs dynamically in userspace. To limit the amount of system calls for every MSR call, create a new thread in QEMU that updates the "virtual" MSR values asynchronously. Each vCPU has its own vMSR to reflect the independence of vCPUs. The thread updates the vMSR values with the ratio of energy consumed of the whole physical CPU package the vCPU thread runs on and the thread's utime and stime values. All other non-vCPU threads are also taken into account. Their energy consumption is evenly distributed among all vCPUs threads running on the same physical CPU package. To overcome the problem that reading the RAPL MSR requires priviliged access, a socket communication between QEMU and the qemu-vmsr-helper is mandatory. You can specified the socket path in the parameter. This feature is activated with -accel kvm,rapl=true,path=/path/sock.sock Actual limitation: - Works only on Intel host CPU because AMD CPUs are using different MSR adresses. - Only the Package Power-Plane (MSR_PKG_ENERGY_STATUS) is reported at the moment. Signed-off-by: Anthony Harivel <aharivel@redhat.com> Link: https://lore.kernel.org/r/20240522153453.1230389-4-aharivel@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hw/nvme: remove useless type castYao Xingtao
The type of req->cmd is NvmeCmd, cast the pointer of this type to NvmeCmd* is useless. Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-07-22hw/nvme: actually implement abortAyush Mishra
Abort was not implemented previously, but we can implement it for AERs and asynchrnously for I/O. Signed-off-by: Ayush Mishra <ayush.m55@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-07-22hw/nvme: add cross namespace copy supportArun Kumar
Extend copy command to copy user data across different namespaces via support for specifying a namespace for each source range Signed-off-by: Arun Kumar <arun.kka@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-07-22target/s390x: filter deprecated properties based on model expansion typeCollin Walling
Currently, there is no way to execute the query-cpu-model-expansion command to retrieve a comprehenisve list of deprecated properties, as the result is dependent per-model. To enable this, the expansion output is modified as such: When reporting a "full" CPU model, show the *entire* list of deprecated properties regardless if they are supported on the model. A full expansion outputs all known CPU model properties anyway, so it makes sense to report all deprecated properties here too. This allows management apps to query a single model (e.g. host) to acquire the full list of deprecated properties. Additionally, when reporting a "static" CPU model, the command will only show deprecated properties that are a subset of the model's *enabled* properties. This is more accurate than how the query was handled before, which blindly reported deprecated properties that were never otherwise introduced for certain models. Acked-by: David Hildenbrand <david@redhat.com> Suggested-by: Jiri Denemark <jdenemar@redhat.com> Signed-off-by: Collin Walling <walling@linux.ibm.com> Message-ID: <20240719181741.35146-1-walling@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-07-22tests: increase timeout per instance of bios-tables-testIgor Mammedov
CI often fails 'cross-i686-tci' job due to runner slowness Log shows that test almost complete, with a few remaining when bios-tables-test timeout hits: 19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test TIMEOUT 610.02s killed by signal 15 SIGTERM ... stderr: TAP parsing error: Too few tests run (expected 8, got 7) At the same time overall job running time is only ~30 out of 1hr allowed. Increase bios-tables-test instance timeout on 5min as a fix for slow CI runners. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-ID: <20240716125930.620861-1-imammedo@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-07-22qtest/fuzz: make range overlap check more readableYao Xingtao
use ranges_overlap() instead of open-coding the overlap check to improve the readability of the code. Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alexander Bulekov <alxndr@bu.edu> Message-ID: <20240722040742.11513-8-yaoxt.fnst@fujitsu.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-07-22hw: Fix crash that happens when introspecting scsi-block on older machine typesThomas Huth
"make check SPEED=slow" is currently failing the device-introspect-test on older machine types since introspecting "scsi-block" is causing an abort: $ ./qemu-system-x86_64 -M pc-q35-8.0 -monitor stdio QEMU 9.0.50 monitor - type 'help' for more information (qemu) device_add scsi-block,help Unexpected error in object_property_find_err() at ../../devel/qemu/qom/object.c:1357: can't apply global scsi-disk-base.migrate-emulated-scsi-request=false: Property 'scsi-block.migrate-emulated-scsi-request' not found Aborted (core dumped) The problem is that the compat code tries to change the "migrate-emulated-scsi-request" property for all devices that are derived from "scsi-block", but the property has only been added to "scsi-hd" and "scsi-cd" via the DEFINE_SCSI_DISK_PROPERTIES macro. Thus let's fix the problem by only changing the property on the devices that really have this property. Fixes: b4912afa5f ("scsi-disk: Fix crash for VM configured with USB CDROM after live migration") Message-ID: <20240703090904.909720-1-thuth@redhat.com> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-07-22tests/avocado/machine_aspeed.py: Increase timeout for TPM testCédric Le Goater
On some runners, test_arm_ast2600_evb_buildroot_tpm can take longer than 90s to complete. Increase timeout for these. Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20240722085547.90650-1-clg@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-07-22tests/avocado: Remove the remainders of the virtiofs_submounts testThomas Huth
The virtiofs_submounts test has been removed in commit 5da7701e2a ("virtiofsd: Remove test"), so we don't need this files anymore. Message-ID: <20240718173125.489901-1-thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-07-22tests/avocado/mem-addr-space-check: Remove unused "import signal"Thomas Huth
The "signal" module is not used here, so we can remove this import statement. Message-ID: <20240719095408.33298-1-thuth@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-07-22tests/avocado: Move LinuxTest related code into a separate fileThomas Huth
Only some few tests are using the LinuxTest class. Move the related code into a separate file so that this does not pollute the main namespace. Message-ID: <20240719095031.32814-1-thuth@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-07-22tests/avocado: Allow overwriting AVOCADO_SHOW env variablePhilippe Mathieu-Daudé
The 'app' level logging is useful, but sometimes we want more, for example QEMU leverages the 'console' logging. Allow overwriting AVOCADO_SHOW from environment, i.e.: $ make check-avocado AVOCADO_SHOW='app,console' Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20240719180211.48073-1-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-07-22tools: build qemu-vmsr-helperAnthony Harivel
Introduce a privileged helper to access RAPL MSR. The privileged helper tool, qemu-vmsr-helper, is designed to provide virtual machines with the ability to read specific RAPL (Running Average Power Limit) MSRs without requiring CAP_SYS_RAWIO privileges or relying on external, out-of-tree patches. The helper tool leverages Unix permissions and SO_PEERCRED socket options to enforce access control, ensuring that only processes explicitly requesting read access via readmsr() from a valid Thread ID can access these MSRs. The list of RAPL MSRs that are allowed to be read by the helper tool is defined in rapl-msr-index.h. This list corresponds to the RAPL MSRs that will be supported in the next commit titled "Add support for RAPL MSRs in KVM/QEMU." The tool is intentionally designed to run on the Linux x86 platform. This initial implementation is tailored for Intel CPUs but can be extended to support AMD CPUs in the future. Signed-off-by: Anthony Harivel <aharivel@redhat.com> Link: https://lore.kernel.org/r/20240522153453.1230389-3-aharivel@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>