aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-03-13gdbstub: Save target's siginfoGustavo Romero
Save target's siginfo into gdbserver_state so it can be used later, for example, in any stub that requires the target's si_signo and si_code. This change affects only linux-user mode. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> Suggested-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240309030901.1726211-4-gustavo.romero@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-03-13linux-user: Move tswap_siginfo out of target codeGustavo Romero
Move tswap_siginfo from target code to handle_pending_signal. This will allow some cleanups and having the siginfo ready to be used in gdbstub. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240309030901.1726211-3-gustavo.romero@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2024-03-13gdbstub: Rename back gdb_handlesigGustavo Romero
Rename gdb_handlesig_reason back to gdb_handlesig. There is no need to add a wrapper for gdb_handlesig and rename it when a new parameter is added. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240309030901.1726211-2-gustavo.romero@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2024-03-13tests/vm: ensure we build everything by defaultAlex Bennée
The "check" target by itself is not enough to ensure we build the user mode binaries. While we can't test them with check-tcg we can at least include them in the build. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Gustavo Romero <gustavo.romero@linaro.org>
2024-03-13migration: Skip only empty block devicesCédric Le Goater
The block .save_setup() handler calls a helper routine init_blk_migration() which builds a list of block devices to take into account for migration. When one device is found to be empty (sectors == 0), the loop exits and all the remaining devices are ignored. This is a regression introduced when bdrv_iterate() was removed. Change that by skipping only empty devices. Cc: Markus Armbruster <armbru@redhat.com> Cc: qemu-stable <qemu-stable@nongnu.org> Suggested-by: Kevin Wolf <kwolf@redhat.com> Fixes: fea68bb6e9fa ("block: Eliminate bdrv_iterate(), use bdrv_next()") Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Link: https://lore.kernel.org/r/20240312120431.550054-1-clg@redhat.com [peterx: fix "Suggested-by:"] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-12docs/specs/pvpanic: document shutdown eventThomas Weißschuh
Shutdown requests are normally hardware dependent. By extending pvpanic to also handle shutdown requests, guests can submit such requests with an easily implementable and cross-platform mechanism. Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de> Message-Id: <20240310-pvpanic-shutdown-spec-v1-1-b258e182ce55@t-8ch.de> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/cxl: Fix missing reserved data in CXL Device DVSECJonathan Cameron
The r3.1 specification introduced a new 2 byte field, but to maintain DWORD alignment, a additional 2 reserved bytes were added. Forgot those in updating the structure definition but did include them in the size define leading to a buffer overrun. Also use the define so that we don't duplicate the value. Fixes: Coverity ID 1534095 buffer overrun Fixes: 8700ee15de ("hw/cxl: Standardize all references on CXL r3.1 and minor updates") Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240308143831.6256-1-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hmat acpi: Fix out of bounds access due to missing use of indirectionJonathan Cameron
With a numa set up such as -numa nodeid=0,cpus=0 \ -numa nodeid=1,memdev=mem \ -numa nodeid=2,cpus=1 and appropriate hmat_lb entries the initiator list is correctly computed and writen to HMAT as 0,2 but then the LB data is accessed using the node id (here 2), landing outside the entry_list array. Stash the reverse lookup when writing the initiator list and use it to get the correct array index index. Fixes: 4586a2cb83 ("hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)") Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240307160326.31570-3-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hmat acpi: Do not add Memory Proximity Domain Attributes Structure ↵Jonathan Cameron
targetting non existent memory. If qemu is started with a proximity node containing CPUs alone, it will provide one of these structures to say memory in this node is directly connected to itself. This description is arguably pointless even if there is memory in the node. If there is no memory present, and hence no SRAT entry it breaks Linux HMAT passing and the table is rejected. https://elixir.bootlin.com/linux/v6.7/source/drivers/acpi/numa/hmat.c#L444 Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240307160326.31570-2-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12qemu-options.hx: Document the virtio-iommu-pci aw-bits optionEric Auger
Document the new aw-bits option. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240307134445.92296-10-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-03-12hw/arm/virt: Set virtio-iommu aw-bits default value to 48Eric Auger
On ARM we set 48b as a default (matching SMMUv3 SMMU_IDR5.VAX == 0). hw_compat_8_2 is used to handle the compatibility for machine types before 9.0 (default was 64 bits). Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <Zhenzhong.duan@intel.com> Message-Id: <20240307134445.92296-9-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/i386/q35: Set virtio-iommu aw-bits default value to 39Eric Auger
Currently the default input range can extend to 64 bits. On x86, when the virtio-iommu protects vfio devices, the physical iommu may support only 39 bits. Let's set the default to 39, as done for the intel-iommu. We use hw_compat_8_2 to handle the compatibility for machines before 9.0 which used to have a virtio-iommu default input range of 64 bits. Of course if aw-bits is set from the command line, the default is overriden. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240307134445.92296-8-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-03-12virtio-iommu: Add an option to define the input range widthEric Auger
aw-bits is a new option that allows to set the bit width of the input address range. This value will be used as a default for the device config input_range.end. By default it is set to 64 bits which is the current value. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20240307134445.92296-7-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12virtio-iommu: Trace domain range limits as unsigned intEric Auger
Use %u format to trace domain_range limits. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20240307134445.92296-6-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12qemu-options.hx: Document the virtio-iommu-pci granule optionEric Auger
We are missing an entry for the virtio-iommu-pci device. Add the information on which machine it is currently supported and document the new granule option. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240307134445.92296-5-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-03-12virtio-iommu: Change the default granule to the host page sizeEric Auger
We used to set the default granule to 4KB but with VFIO assignment it makes more sense to use the actual host page size. Indeed when hotplugging a VFIO device protected by a virtio-iommu on a 64kB/64kB host/guest config, we current get a qemu crash: "vfio: DMA mapping failed, unable to continue" This is due to the hot-attached VFIO device calling memory_region_iommu_set_page_size_mask() with 64kB granule whereas the virtio-iommu granule was already frozen to 4KB on machine init done. Set the granule property to "host" and introduce a new compat. The page size mask used before 9.0 was qemu_target_page_mask(). Since the virtio-iommu currently only supports x86_64 and aarch64, this matched a 4KB granule. Note that the new default will prevent 4kB guest on 64kB host because the granule will be set to 64kB which would be larger than the guest page size. In that situation, the virtio-iommu driver fails on viommu_domain_finalise() with "granule 0x10000 larger than system page size 0x1000". In that case the workaround is to request 4K granule. The current limitation of global granule in the virtio-iommu should be removed and turned into per domain granule. But until we get this upgraded, this new default is probably better because I don't think anyone is currently interested in running a 4KB page size guest with virtio-iommu on a 64KB host. However supporting 64kB guest on 64kB host with virtio-iommu and VFIO looks a more important feature. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20240307134445.92296-4-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12virtio-iommu: Add a granule propertyEric Auger
This allows to choose which granule will be used by default by the virtio-iommu. Current page size mask default is qemu_target_page_mask so this translates into a 4k granule on ARM and x86_64 where virtio-iommu is supported. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20240307134445.92296-3-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/i386/acpi-build: Add support for SRAT Generic Initiator structuresAnkit Agrawal
The acpi-generic-initiator object is added to allow a host device to be linked with a NUMA node. Qemu use it to build the SRAT Generic Initiator Affinity structure [1]. Add support for i386. [1] ACPI Spec 6.3, Section 5.2.16.6 Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Message-Id: <20240308145525.10886-4-ankita@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-03-12hw/acpi: Implement the SRAT GI affinity structureAnkit Agrawal
ACPI spec provides a scheme to associate "Generic Initiators" [1] (e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with integrated compute or DMA engines GPUs) with Proximity Domains. This is achieved using Generic Initiator Affinity Structure in SRAT. During bootup, Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA node for each unique PXM ID encountered. Qemu currently do not implement these structures while building SRAT. Add GI structures while building VM ACPI SRAT. The association between device and node are stored using acpi-generic-initiator object. Lookup presence of all such objects and use them to build these structures. The structure needs a PCI device handle [2] that consists of the device BDF. The vfio-pci device corresponding to the acpi-generic-initiator object is located to determine the BDF. [1] ACPI Spec 6.3, Section 5.2.16.6 [2] ACPI Spec 6.3, Table 5.80 Cc: Jonathan Cameron <qemu-devel@nongnu.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cedric Le Goater <clg@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Message-Id: <20240308145525.10886-3-ankita@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12qom: new object to associate device to NUMA nodeAnkit Agrawal
NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows partitioning of the GPU device resources (including device memory) into several (upto 8) isolated instances. Each of the partitioned memory needs a dedicated NUMA node to operate. The partitions are not fixed and they can be created/deleted at runtime. Unfortunately Linux OS does not provide a means to dynamically create/destroy NUMA nodes and such feature implementation is not expected to be trivial. The nodes that OS discovers at the boot time while parsing SRAT remains fixed. So we utilize the Generic Initiator (GI) Affinity structures that allows association between nodes and devices. Multiple GI structures per BDF is possible, allowing creation of multiple nodes by exposing unique PXM in each of these structures. Implement the mechanism to build the GI affinity structures as Qemu currently does not. Introduce a new acpi-generic-initiator object to allow host admin link a device with an associated NUMA node. Qemu maintains this association and use this object to build the requisite GI Affinity Structure. When multiple NUMA nodes are associated with a device, it is required to create those many number of acpi-generic-initiator objects, each representing a unique device:node association. Following is one of a decoded GI affinity structure in VM ACPI SRAT. [0C8h 0200 1] Subtable Type : 05 [Generic Initiator Affinity] [0C9h 0201 1] Length : 20 [0CAh 0202 1] Reserved1 : 00 [0CBh 0203 1] Device Handle Type : 01 [0CCh 0204 4] Proximity Domain : 00000007 [0D0h 0208 16] Device Handle : 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 [0E0h 0224 4] Flags (decoded below) : 00000001 Enabled : 1 [0E4h 0228 4] Reserved2 : 00000000 [0E8h 0232 1] Subtable Type : 05 [Generic Initiator Affinity] [0E9h 0233 1] Length : 20 An admin can provide a range of acpi-generic-initiator objects, each associating a device (by providing the id through pci-dev argument) to the desired NUMA node (using the node argument). Currently, only PCI device is supported. For the grace hopper system, create a range of 8 nodes and associate that with the device using the acpi-generic-initiator object. While a configuration of less than 8 nodes per device is allowed, such configuration will prevent utilization of the feature to the fullest. The following sample creates 8 nodes per PCI device for a VM with 2 PCI devices and link them to the respecitve PCI device using acpi-generic-initiator objects: -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \ -numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \ -numa node,nodeid=8 -numa node,nodeid=9 \ -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \ -object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \ -object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \ -object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \ -object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \ -object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \ -object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \ -object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \ -object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \ -numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \ -numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \ -numa node,nodeid=16 -numa node,nodeid=17 \ -device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \ -object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \ -object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \ -object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \ -object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \ -object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \ -object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \ -object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \ -object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \ Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1] Cc: Jonathan Cameron <qemu-devel@nongnu.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Message-Id: <20240308145525.10886-2-ankita@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/i386/pc: Inline pc_cmos_init() into pc_cmos_init_late() and remove itBernhard Beschow
Now that pc_cmos_init() doesn't populate the X86MachineState::rtc attribute any longer, its duties can be merged into pc_cmos_init_late() which is called within machine_done notifier. This frees pc_piix and pc_q35 from explicit CMOS initialization. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240303185332.1408-5-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/i386/pc: Set "normal" boot device order in pc_basic_device_init()Bernhard Beschow
The boot device order may change during the lifetime of a VM. Usually, the "normal" order is set once during machine init(). However, if a user specifies `-boot once=...`, the "normal" order is overwritten by the "once" order just before machine_done, and a reset handler is registered which restores the "normal" order during the next reset. In the next patch, pc_cmos_init() will be inlined into pc_cmos_init_late() which runs during machine_done. This means that the "once" boot order would be overwritten again with the "normal" boot order -- which renders the user's choice ineffective. Fix this by setting the "normal" boot order in pc_basic_device_init() which already registers the boot_set() handler. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240303185332.1408-4-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/i386/pc: Avoid one use of the current_machine globalBernhard Beschow
The RTC can be accessed through the X86 machine instance, so rather than passing the RTC it's possible to pass the machine state instead. This avoids pc_boot_set() from having to access the current_machine global. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240303185332.1408-3-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-03-12hw/i386/pc: Remove "rtc_state" link againBernhard Beschow
Commit 99e1c1137b6f "hw/i386/pc: Populate RTC attribute directly" made linking the "rtc_state" property unnecessary and removed it. Commit 84e945aad2d0 "vl, pc: turn -no-fd-bootchk into a machine property" accidently reintroduced the link. Remove it again since it is not needed. Fixes: 84e945aad2d0 "vl, pc: turn -no-fd-bootchk into a machine property" Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240303185332.1408-2-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-03-12Revert "hw/i386/pc: Confine system flash handling to pc_sysfw"Bernhard Beschow
Specifying the property `-M pflash0` results in a regression: qemu-system-x86_64: Property 'pc-q35-9.0-machine.pflash0' not found Revert the change for now until a solution is found. This reverts commit 6f6ad2b24582593d8feb00434ce2396840666227. Reported-by: Volker Rümelin <vr_qemu@t-online.de> Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240226215909.30884-3-shentey@gmail.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12Revert "hw/i386/pc_sysfw: Inline pc_system_flash_create() and remove it"Bernhard Beschow
Commit 6f6ad2b24582 "hw/i386/pc: Confine system flash handling to pc_sysfw" causes a regression when specifying the property `-M pflash0` in the PCI PC machines: qemu-system-x86_64: Property 'pc-q35-9.0-machine.pflash0' not found In order to revert the commit, the commit below must be reverted first. This reverts commit cb05cc16029bb0a61ac5279ab7b3b90dcf2aa69f. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240226215909.30884-2-shentey@gmail.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12pc: q35: Bump max_cpus to 4096 vcpusAni Sinha
Since commit f10a570b093e6 ("KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to allow up to 4096 vCPUs") Linux kernel can support upto a maximum number of 4096 vcpus when MAXSMP is enabled in the kernel. At present, QEMU has been tested to correctly boot a linux guest with 4096 vcpus using the current edk2 upstream master branch that has the fixes corresponding to the following two PRs: https://github.com/tianocore/edk2/pull/5410 https://github.com/tianocore/edk2/pull/5418 The changes merged into edk2 with the above PRs will be in the upcoming 2024-05 release. With current seabios firmware, it boots fine with 4096 vcpus already. So bump up the value max_cpus to 4096 for q35 machines versions 9 and newer. Q35 machines versions 8.2 and older continue to support 1024 maximum vcpus as before for compatibility reasons. If KVM is not able to support the specified number of vcpus, QEMU would return the following error messages: $ ./qemu-system-x86_64 -cpu host -accel kvm -machine q35 -smp 1728 qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested (1728) exceeds the recommended cpus supported by KVM (12) qemu-system-x86_64: -accel kvm: warning: Number of hotpluggable cpus requested (1728) exceeds the recommended cpus supported by KVM (12) Number of SMP cpus requested (1728) exceeds the maximum cpus supported by KVM (1024) Cc: Daniel P. Berrangé <berrange@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Julia Suvorova <jusual@redhat.com> Cc: kraxel@redhat.com Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Ani Sinha <anisinha@redhat.com> Message-Id: <20240228143351.3967-1-anisinha@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/pci: Always call pcie_sriov_pf_reset()Akihiko Odaki
Call pcie_sriov_pf_reset() from pci_do_device_reset() just as we do for msi_reset() and msix_reset() to prevent duplicating code for each SR-IOV PF. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240228-reuse-v8-5-282660281e60@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
2024-03-12pcie_sriov: Do not reset NumVFs after disabling VFsAkihiko Odaki
The spec does not NumVFs is reset after disabling VFs except when resetting the PF. Clearing it is guest visible and out of spec, even though Linux doesn't rely on this value being preserved, so we never noticed. Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240228-reuse-v8-4-282660281e60@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12pcie_sriov: Reset SR-IOV extended capabilityAkihiko Odaki
pcie_sriov_pf_disable_vfs() is called when resetting the PF, but it only disables VFs and does not reset SR-IOV extended capability, leaking the state and making the VF Enable register inconsistent with the actual state. Replace pcie_sriov_pf_disable_vfs() with pcie_sriov_pf_reset(), which does not only disable VFs but also resets the capability. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240228-reuse-v8-3-282660281e60@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
2024-03-12pcie_sriov: Validate NumVFsAkihiko Odaki
The guest may write NumVFs greater than TotalVFs and that can lead to buffer overflow in VF implementations. Cc: qemu-stable@nongnu.org Fixes: CVE-2024-26327 Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240228-reuse-v8-2-282660281e60@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
2024-03-12hw/nvme: Use pcie_sriov_num_vfs()Akihiko Odaki
nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV configurations to know the number of VFs being disabled due to SR-IOV configuration writes, but the logic was flawed and resulted in out-of-bound memory access. It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled VFs, but it actually doesn't in the following cases: - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been. - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set. - VFs were only partially enabled because of realization failure. It is a responsibility of pcie_sriov to interpret SR-IOV configurations and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it provides, to get the number of enabled VFs before and after SR-IOV configuration writes. Cc: qemu-stable@nongnu.org Fixes: CVE-2024-26328 Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command") Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240228-reuse-v8-1-282660281e60@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12Implement SMBIOS type 9 v2.6Felix Wu
Signed-off-by: Felix Wu <flwu@google.com> Signed-off-by: Nabih Estefan <nabihestefan@google.com> Message-Id: <20240221170027.1027325-3-nabihestefan@google.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12Implement base of SMBIOS type 9 descriptor.Felix Wu
Version 2.1+. Signed-off-by: Felix Wu <flwu@google.com> Signed-off-by: Nabih Estefan <nabihestefan@google.com> Message-Id: <20240221170027.1027325-2-nabihestefan@google.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/intc: Check @errp to handle the error of IOAPICCommonClass.realize()Zhao Liu
IOAPICCommonClass implements its own private realize(), and this private realize() allows error. Since IOAPICCommonClass.realize() returns void, to check the error, dereference @errp with ERRP_GUARD(). Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20240223085653.1255438-8-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-03-12hw/vfio/iommufd: Fix missing ERRP_GUARD() in iommufd_cdev_getfd()Zhao Liu
As the comment in qapi/error, dereferencing @errp requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: * - It must not be dereferenced, because it may be null. ... * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. * * Using it when it's not needed is safe, but please avoid cluttering * the source with useless code. But in iommufd_cdev_getfd(), @errp is dereferenced without ERRP_GUARD(): if (*errp) { error_prepend(errp, VFIO_MSG_PREFIX, path); } Currently, since vfio_attach_device() - the caller of iommufd_cdev_getfd() - is always called in DeviceClass.realize() context and doesn't get the NULL @errp parameter, iommufd_cdev_getfd() hasn't triggered the bug that dereferencing the NULL @errp. To follow the requirement of @errp, add missing ERRP_GUARD() in iommufd_cdev_getfd(). Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20240223085653.1255438-7-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/pci-bridge/cxl_upstream: Fix missing ERRP_GUARD() in cxl_usp_realize()Zhao Liu
As the comment in qapi/error, dereferencing @errp requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: * - It must not be dereferenced, because it may be null. ... * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. * * Using it when it's not needed is safe, but please avoid cluttering * the source with useless code. But in cxl_usp_realize(), @errp is dereferenced without ERRP_GUARD(): cxl_doe_cdat_init(cxl_cstate, errp); if (*errp) { goto err_cap; } Here we check *errp, because cxl_doe_cdat_init() returns void. And since cxl_usp_realize() - as a PCIDeviceClass.realize() method - doesn't get the NULL @errp parameter, it hasn't triggered the bug that dereferencing the NULL @errp. To follow the requirement of @errp, add missing ERRP_GUARD() in cxl_usp_realize(). Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20240223085653.1255438-6-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Thomas Huth <thuth@redhat.com>
2024-03-12hw/misc/xlnx-versal-trng: Check returned bool in trng_prop_fault_event_set()Zhao Liu
As the comment in qapi/error, dereferencing @errp requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: * - It must not be dereferenced, because it may be null. ... * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. * * Using it when it's not needed is safe, but please avoid cluttering * the source with useless code. But in trng_prop_fault_event_set, @errp is dereferenced without ERRP_GUARD(): visit_type_uint32(v, name, events, errp); if (*errp) { return; } Currently, since trng_prop_fault_event_set() doesn't get the NULL @errp parameter as a "set" method of object property, it hasn't triggered the bug that dereferencing the NULL @errp. And since visit_type_uint32() returns bool, check the returned bool directly instead of dereferencing @errp, then we needn't the add missing ERRP_GUARD(). Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20240223085653.1255438-5-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-03-12hw/mem/cxl_type3: Fix missing ERRP_GUARD() in ct3_realize()Zhao Liu
As the comment in qapi/error, dereferencing @errp requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: * - It must not be dereferenced, because it may be null. ... * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. * * Using it when it's not needed is safe, but please avoid cluttering * the source with useless code. But in ct3_realize(), @errp is dereferenced without ERRP_GUARD(): cxl_doe_cdat_init(cxl_cstate, errp); if (*errp) { goto err_free_special_ops; } Here we check *errp, because cxl_doe_cdat_init() returns void. And ct3_realize() - as a PCIDeviceClass.realize() method - doesn't get the NULL @errp parameter, it hasn't triggered the bug that dereferencing the NULL @errp. To follow the requirement of @errp, add missing ERRP_GUARD() in ct3_realize(). Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20240223085653.1255438-4-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-03-12hw/display/macfb: Fix missing ERRP_GUARD() in macfb_nubus_realize()Zhao Liu
As the comment in qapi/error, dereferencing @errp requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: * - It must not be dereferenced, because it may be null. ... * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. * * Using it when it's not needed is safe, but please avoid cluttering * the source with useless code. But in macfb_nubus_realize(), @errp is dereferenced without ERRP_GUARD(): ndc->parent_realize(dev, errp); if (*errp) { return; } Here we check *errp, because the ndc->parent_realize(), as a DeviceClass.realize() callback, returns void. And since macfb_nubus_realize(), also as a DeviceClass.realize(), doesn't get the NULL @errp parameter, it hasn't triggered the bug that dereferencing the NULL @errp. To follow the requirement of @errp, add missing ERRP_GUARD() in macfb_nubus_realize(). Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20240223085653.1255438-3-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/cxl/cxl-host: Fix missing ERRP_GUARD() in cxl_fixed_memory_window_config()Zhao Liu
As the comment in qapi/error, dereferencing @errp requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: * - It must not be dereferenced, because it may be null. ... * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. * * Using it when it's not needed is safe, but please avoid cluttering * the source with useless code. But in cxl_fixed_memory_window_config(), @errp is dereferenced in 2 places without ERRP_GUARD(): fw->enc_int_ways = cxl_interleave_ways_enc(fw->num_targets, errp); if (*errp) { return; } and fw->enc_int_gran = cxl_interleave_granularity_enc(object->interleave_granularity, errp); if (*errp) { return; } For the above 2 places, we check "*errp", because neither function returns a suitable error code. And since machine_set_cfmw() - the caller of cxl_fixed_memory_window_config() - doesn't get the NULL @errp parameter as the "set" method of object property, cxl_fixed_memory_window_config() hasn't triggered the bug that dereferencing the NULL @errp. To follow the requirement of @errp, add missing ERRP_GUARD() in cxl_fixed_memory_window_config(). Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20240223085653.1255438-2-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-03-12hw/virtio: Add support for VDPA network simulation devicesHao Chen
This patch adds support for VDPA network simulation devices. The device is developed based on virtio-net and tap backend, and supports hardware live migration function. For more details, please refer to "docs/system/devices/vdpa-net.rst" Signed-off-by: Hao Chen <chenh@yusur.tech> Message-Id: <20240221073802.2888022-1-chenh@yusur.tech> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/virtio: check owner for removing objectsAlbert Esteve
Shared objects lack spoofing protection. For VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE messages received by the vhost-user interface, any backend was allowed to remove entries from the shared table just by knowing the UUID. Only the owner of the entry shall be allowed to removed their resources from the table. To fix that, add a check for all *SHARED_OBJECT_REMOVE messages received. A vhost device can only remove TYPE_VHOST_DEV entries that are owned by them, otherwise skip the removal, and inform the device that the entry has not been removed in the answer. Signed-off-by: Albert Esteve <aesteve@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20240219143423.272012-2-aesteve@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/audio/virtio-sound: return correct command response sizeVolker Rümelin
The payload size returned by command VIRTIO_SND_R_PCM_INFO is wrong. The code in process_cmd() assumes that all commands return only a virtio_snd_hdr payload, but some commands like VIRTIO_SND_R_PCM_INFO may return an additional payload. Add a zero initialized payload_size variable to struct virtio_snd_ctrl_command to allow for additional payloads. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Volker Rümelin <vr_qemu@t-online.de> Message-Id: <20240218083351.8524-1-vr_qemu@t-online.de> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/pci-bridge/pxb-cxl: Drop RAS capability from host bridge.Jonathan Cameron
This CXL component isn't allowed to have a RAS capability. Whilst this should be harmless as software is not expected to look here, good to clean it up. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240215155206.2736-1-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12vdpa: trace skipped memory sectionsEugenio Pérez
Sometimes, certain parts are not being skipped in vhost_vdpa_listener_region_del, but they are skipped in vhost_vdpa_listener_region_add, or vice versa. The vhost-vdpa code expects all parts to maintain their properties, so we're adding a trace to help with debugging when any part is skipped. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20240215103616.330518-3-eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12vdpa: stash memory region properties in varsEugenio Pérez
Next changes uses this variables, so avoid call repeatedly to memory region functions. No functional change intended. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20240215103616.330518-2-eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12pcie: Support PCIe Gen5/Gen6 link speedsLukas Stockner
This patch extends the PCIe link speed option so that slots can be configured as supporting 32GT/s (Gen5) or 64GT/s (Gen5) speeds. This is as simple as setting the appropriate bit in LnkCap2 and the appropriate value in LnkCap and LnkCtl2. Signed-off-by: Lukas Stockner <lstockner@genesiscloud.com> Message-Id: <20240215012326.3272366-1-lstockner@genesiscloud.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12libvhost-user: Mark mmap'ed region memory as MADV_DONTDUMPDavid Hildenbrand
We already use MADV_NORESERVE to deal with sparse memory regions. Let's also set madvise(MADV_DONTDUMP), otherwise a crash of the process can result in us allocating all memory in the mmap'ed region for dumping purposes. This change implies that the mmap'ed rings won't be included in a coredump. If ever required for debugging purposes, we could mark only the mapped rings MADV_DODUMP. Ignore errors during madvise() for now. Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-15-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12libvhost-user: Dynamically remap rings after (temporarily?) removing memory ↵David Hildenbrand
regions Currently, we try to remap all rings whenever we add a single new memory region. That doesn't quite make sense, because we already map rings when setting the ring address, and panic if that goes wrong. Likely, that handling was simply copied from set_mem_table code, where we actually have to remap all rings. Remapping all rings might require us to walk quite a lot of memory regions to perform the address translations. Ideally, we'd simply remove that remapping. However, let's be a bit careful. There might be some weird corner cases where we might temporarily remove a single memory region (e.g., resize it), that would have worked for now. Further, a ring might be located on hotplugged memory, and as the VM reboots, we might unplug that memory, to hotplug memory before resetting the ring addresses. So let's unmap affected rings as we remove a memory region, and try dynamically mapping the ring again when required. Acked-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-14-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>