aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-07-26ppc/pnv: Begin a more complete ADU LPC model for POWER9/10Nicholas Piggin
This implements a framework for an ADU unit model. The ADU unit actually implements XSCOM, which is the bridge between MMIO and PIB. However it also includes control and status registers and other functions that are exposed as PIB (xscom) registers. To keep things simple, pnv_xscom.c remains the XSCOM bridge implementation, and pnv_adu.c implements the ADU registers and other functions. So far, just the ADU no-op registers in the pnv_xscom.c default handler are moved over to the adu model. Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26ppc/pnv: Implement POWER9 LPC PSI serirq outputs and auto-clear functionNicholas Piggin
The POWER8 LPC ISA device irqs all get combined and reported to the line connected the PSI LPCHC irq. POWER9 changed this so only internal LPC host controller irqs use that line, and the device irqs get routed to 4 new lines connected to PSI SERIRQ0-3. POWER9 also introduced a new feature that automatically clears the irq status in the LPC host controller when EOI'ed, so software does not have to. The powernv OPAL (skiboot) firmware managed to work because the LPCHC irq handler scanned all LPC irqs and handled those including clearing status even on POWER9 systems. So LPC irqs worked despite OPAL thinking it was running in POWER9 mode. After this change, UART interrupts show up on serirq1 which is where OPAL routes them to: cat /proc/interrupts ... 20: 0 XIVE-IRQ 1048563 Level opal-psi#0:lpchc ... 25: 34 XIVE-IRQ 1048568 Level opal-psi#0:lpc_serirq_mux1 Whereas they previously turn up on lpchc. Reviewed-by: Glenn Miles <milesg@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26ppc/pnv: Fix loss of LPC SERIRQ interruptsGlenn Miles
The LPC HC irq status register bits are set when an LPC IRQSER input is asserted. These irq status bits drive the PSI irq to the CPU interrupt controller. The LPC HC irq status bits are cleared by software writing to the register with 1's for the bits to clear. Existing register write was clearing the irq status bits even when the input was asserted, this results in interrupts being lost. This fix changes the behavior to keep track of the device IRQ status in internal state that is separate from the irq status register, and only allowing the irq status bits to be cleared if the associated input is not asserted. Signed-off-by: Glenn Miles <milesg@linux.ibm.com> [np: rebased before P9 PSI SERIRQ patch, adjust changelog/comments] Reviewed-by: Glenn Miles <milesg@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26cpu-common.c: export cpu_get_free_index to be reused laterHarsh Prateek Bora
This helper provides an easy way to identify the next available free cpu index which can be used for vcpu creation. Until now, this is being called at a very later stage and there is a need to be able to call it earlier (for now, with ppc64) hence the need to export. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26accel/kvm: Introduce kvm_create_and_park_vcpu() helperHarsh Prateek Bora
There are distinct helpers for creating and parking a KVM vCPU. However, there can be cases where a platform needs to create and immediately park the vCPU during early stages of vcpu init which can later be reused when vcpu thread gets initialized. This would help detect failures with kvm_create_vcpu at an early stage. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-26spapr: Migrate ail-mode-3 spapr capNicholas Piggin
This cap did not add the migration code when it was introduced. This results in migration failure when changing the default using the command line. Cc: qemu-stable@nongnu.org Fixes: ccc5a4c5e10 ("spapr: Add SPAPR_CAP_AIL_MODE_3 for AIL mode 3 support for H_SET_MODE hcall") Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-24crypto: propagate errors from TLS session I/O callbacksDaniel P. Berrangé
GNUTLS doesn't know how to perform I/O on anything other than plain FDs, so the TLS session provides it with some I/O callbacks. The GNUTLS API design requires these callbacks to return a unix errno value, which means we're currently loosing the useful QEMU "Error" object. This changes the I/O callbacks in QEMU to stash the "Error" object in the QCryptoTLSSession class, and fetch it when seeing an I/O error returned from GNUTLS, thus preserving useful error messages. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-07-24crypto: push error reporting into TLS session I/O APIsDaniel P. Berrangé
The current TLS session I/O APIs just return a synthetic errno value on error, which has been translated from a gnutls error value. This looses a large amount of valuable information that distinguishes different scenarios. Pushing population of the "Error *errp" object into the TLS session I/O APIs gives more detailed error information. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2024-07-24Merge tag 'hw-misc-20240723' of https://github.com/philmd/qemu into stagingRichard Henderson
Misc HW patch queue - Restrict probe_access*() functions to TCG (Phil) - Extract do_invalidate_device_tlb from vtd_process_device_iotlb_desc (Clément) - Fixes in Loongson IPI model (Bibo & Phil) - Make docs/interop/firmware.json compatible with qapi-gen.py script (Thomas) - Correct MPC I2C MMIO region size (Zoltan) - Remove useless cast in Loongson3 Virt machine (Yao) - Various uses of range overlap API (Yao) - Use ERRP_GUARD macro in nubus_virtio_mmio_realize (Zhao) - Use DMA memory API in Goldfish UART model (Phil) - Expose fifo8_pop_buf and introduce fifo8_drop (Phil) - MAINTAINERS updates (Zhao, Phil) # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmagFF8ACgkQ4+MsLN6t # wN5bKg//f5TwUhsy2ff0FJpHheDOj/9Gc2nZ1U/Fp0E5N3sz3A7MGp91wye6Xwi3 # XG34YN9LK1AVzuCdrEEs5Uaxs1ZS1R2mV+fZaGHwYYxPDdnXxGyp/2Q0eyRxzbcN # zxE2hWscYSZbPVEru4HvZJKfp4XnE1cqA78fJKMAdtq0IPq38tmQNRlJ+gWD9dC6 # ZUHXPFf3DnucvVuwqb0JYO/E+uJpcTtgR6pc09Xtv/HFgMiS0vKZ1I/6LChqAUw9 # eLMpD/5V2naemVadJe98/dL7gIUnhB8GTjsb4ioblG59AO/uojutwjBSQvFxBUUw # U5lX9OSn20ouwcGiqimsz+5ziwhCG0R6r1zeQJFqUxrpZSscq7NQp9ygbvirm+wS # edLc8yTPf4MtYOihzPP9jLPcXPZjEV64gSnJISDDFYWANCrysX3suaFEOuVYPl+s # ZgQYRVSSYOYHgNqBSRkPKKVUxskSQiqLY3SfGJG4EA9Ktt5lD1cLCXQxhdsqphFm # Ws3zkrVVL0EKl4v/4MtCgITIIctN1ZJE9u3oPJjASqSvK6EebFqAJkc2SidzKHz0 # F3iYX2AheWNHCQ3HFu023EvFryjlxYk95fs2f6Uj2a9yVbi813qsvd3gcZ8t0kTT # +dmQwpu1MxjzZnA6838R6OCMnC+UpMPqQh3dPkU/5AF2fc3NnN8= # =J/I2 # -----END PGP SIGNATURE----- # gpg: Signature made Wed 24 Jul 2024 06:36:47 AM AEST # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] * tag 'hw-misc-20240723' of https://github.com/philmd/qemu: (28 commits) MAINTAINERS: Add myself as a reviewer of machine core MAINTAINERS: Cover guest-agent in QAPI schema util/fifo8: Introduce fifo8_drop() util/fifo8: Expose fifo8_pop_buf() util/fifo8: Rename fifo8_pop_buf() -> fifo8_pop_bufptr() util/fifo8: Rename fifo8_peek_buf() -> fifo8_peek_bufptr() util/fifo8: Use fifo8_reset() in fifo8_create() util/fifo8: Fix style chardev/char-fe: Document returned value on error hw/char/goldfish: Use DMA memory API hw/nubus/virtio-mmio: Fix missing ERRP_GUARD() in realize handler dump: make range overlap check more readable crypto/block-luks: make range overlap check more readable system/memory_mapping: make range overlap check more readable sparc/ldst_helper: make range overlap check more readable cxl/mailbox: make range overlap check more readable util/range: Make ranges_overlap() return bool hw/mips/loongson3_virt: remove useless type cast hw/i2c/mpc_i2c: Fix mmio region size docs/interop/firmware.json: convert "Example" section ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-24Merge tag 'pull-vfio-20240723-1' of https://github.com/legoater/qemu into ↵Richard Henderson
staging vfio queue: * IOMMUFD Dirty Tracking support * Fix for a possible SEGV in IOMMU type1 container * Dropped initialization of host IOMMU device with mdev devices # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmafyVUACgkQUaNDx8/7 # 7KGebRAAzEYxvstDxSPNF+1xx937TKbRpiKYtspTfEgu4Ht50MwO2ZqnVWzTBSwa # qcjhDf2avMBpBvkp4O9fR7nXR0HRN2KvYrBSThZ3Qpqu4KjxCAGcHI5uYmgfizYh # BBLrw3eWME5Ry220TinQF5KFl50vGq7Z/mku5N5Tgj2qfTfCXYK1Kc19SyAga49n # LSokTIjZAGJa4vxrE7THawaEUjFRjfCJey64JUs/TPJaGr4R1snJcWgETww6juUE # 9OSw/xl0AoQhaN/ZTRC1qCsBLUI2MVPsC+x+vqVK62HlTjCx+uDRVQ8KzfDzjCeH # gaLkMjxJSuJZMpm4UU7DBzDGEGcEBCGeNyFt37BSqqPPpX55CcFhj++d8vqTiwpF # YzmTNd/znxcZTw6OJN9sQZohh+NeS86CVZ3x31HD3dXifhRf17jbh7NoIyi+0ZCb # N+mytOH5BXsD+ddwbk+yMaxXV43Fgz7ThG5tB1tjhhNtLZHDA5ezFvGZ5F/FJrqE # xAbjOhz5MC+RcOVNSzQJCULNqFpfE6Gqeys6btEDm/ltf4LpAe6W1HYuv8BJc19T # UsqGK2yKAuQX8GErYxJ1zqZCttVrgpsmXFYTC5iGbxC84mvsF0Iti96IdXz9gfzN # Vlb2OxoefcOwVqIhbkvTZW0ZwYGGDDPAYhLMfr5lSuRqj123OOo= # =cViP # -----END PGP SIGNATURE----- # gpg: Signature made Wed 24 Jul 2024 01:16:37 AM AEST # gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1 # gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1 * tag 'pull-vfio-20240723-1' of https://github.com/legoater/qemu: vfio/common: Allow disabling device dirty page tracking vfio/migration: Don't block migration device dirty tracking is unsupported vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support vfio/iommufd: Probe and request hwpt dirty tracking capability vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device() vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps vfio/{iommufd,container}: Remove caps::aw_bits vfio/iommufd: Introduce auto domain creation vfio/ccw: Don't initialize HOST_IOMMU_DEVICE with mdev vfio/ap: Don't initialize HOST_IOMMU_DEVICE with mdev vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev vfio/pci: Extract mdev check into an helper hw/vfio/container: Fix SIGSEV on vfio_container_instance_finalize() Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-24Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into stagingRichard Henderson
* target/i386/kvm: support for reading RAPL MSRs using a helper program * hpet: emulation improvements # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaelL4UHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroMXoQf+K77lNlHLETSgeeP3dr7yZPOmXjjN # qFY/18jiyLw7MK1rZC09fF+n9SoaTH8JDKupt0z9M1R10HKHLIO04f8zDE+dOxaE # Rou3yKnlTgFPGSoPPFr1n1JJfxtYlLZRoUzaAcHUaa4W7JR/OHJX90n1Rb9MXeDk # jV6P0v1FWtIDdM6ERm9qBGoQdYhj6Ra2T4/NZKJFXwIhKEkxgu4yO7WXv8l0dxQz # jE4fKotqAvrkYW1EsiVZm30lw/19duhvGiYeQXoYhk8KKXXjAbJMblLITSNWsCio # 3l6Uud/lOxekkJDAq5nH3H9hCBm0WwvwL+0vRf3Mkr+/xRGvrhtmUdp8NQ== # =00mB # -----END PGP SIGNATURE----- # gpg: Signature made Tue 23 Jul 2024 03:19:58 AM AEST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: hpet: avoid timer storms on periodic timers hpet: store full 64-bit target value of the counter hpet: accept 64-bit reads and writes hpet: place read-only bits directly in "new_val" hpet: remove unnecessary variable "index" hpet: ignore high bits of comparator in 32-bit mode hpet: fix and cleanup persistence of interrupt status Add support for RAPL MSRs in KVM/Qemu tools: build qemu-vmsr-helper qio: add support for SO_PEERCRED for socket channel target/i386: do not crash if microvm guest uses SGX CPUID leaves Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-24Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Richard Henderson
into staging virtio,pci,pc: features,fixes pci: Initial support for SPDM Responders cxl: Add support for scan media, feature commands, device patrol scrub control, DDR5 ECS control, firmware updates virtio: in-order support virtio-net: support for SR-IOV emulation (note: known issues on s390, might get reverted if not fixed) smbios: memory device size is now configurable per Machine cpu: architecture agnostic code to support vCPU Hotplug Fixes, cleanups all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmae9l8PHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRp8fYH/impBH9nViO/WK48io4mLSkl0EUL8Y/xrMvH # zKFCKaXq8D96VTt1Z4EGKYgwG0voBKZaCEKYU/0ARGnSlSwxINQ8ROCnBWMfn2sx # yQt08EXVMznNLtXjc6U5zCoCi6SaV85GH40No3MUFXBQt29ZSlFqO/fuHGZHYBwS # wuVKvTjjNF4EsGt3rS4Qsv6BwZWMM+dE6yXpKWk68kR8IGp+6QGxkMbWt9uEX2Md # VuemKVnFYw0XGCGy5K+ZkvoA2DGpEw0QxVSOMs8CI55Oc9SkTKz5fUSzXXGo1if+ # M1CTjOPJu6pMym6gy6XpFa8/QioDA/jE2vBQvfJ64TwhJDV159s= # =k8e9 # -----END PGP SIGNATURE----- # gpg: Signature made Tue 23 Jul 2024 10:16:31 AM AEST # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [undefined] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [undefined] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (61 commits) hw/nvme: Add SPDM over DOE support backends: Initial support for SPDM socket support hw/pci: Add all Data Object Types defined in PCIe r6.0 tests/acpi: Add expected ACPI AML files for RISC-V tests/qtest/bios-tables-test.c: Enable basic testing for RISC-V tests/acpi: Add empty ACPI data files for RISC-V tests/qtest/bios-tables-test.c: Remove the fall back path tests/acpi: update expected DSDT blob for aarch64 and microvm acpi/gpex: Create PCI link devices outside PCI root bridge tests/acpi: Allow DSDT acpi table changes for aarch64 hw/riscv/virt-acpi-build.c: Update the HID of RISC-V UART hw/riscv/virt-acpi-build.c: Add namespace devices for PLIC and APLIC virtio-iommu: Add trace point on virtio_iommu_detach_endpoint_from_domain hw/vfio/common: Add vfio_listener_region_del_iommu trace event virtio-iommu: Remove the end point on detach virtio-iommu: Free [host_]resv_ranges on unset_iommu_devices virtio-iommu: Remove probe_done Revert "virtio-iommu: Clear IOMMUDevice when VFIO device is unplugged" gdbstub: Add helper function to unregister GDB register space physmem: Add helper function to destroy CPU AddressSpace ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-23util/fifo8: Introduce fifo8_drop()Philippe Mathieu-Daudé
Add the fifo8_drop() helper for clarity. It is a simple wrapper over fifo8_pop_buf(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20240722160745.67904-8-philmd@linaro.org>
2024-07-23util/fifo8: Expose fifo8_pop_buf()Philippe Mathieu-Daudé
Extract fifo8_pop_buf() from hw/scsi/esp.c and expose it as part of the <qemu/fifo8.h> API. This function takes care of non-contiguous (wrapped) FIFO buffer (which is an implementation detail). Suggested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20240722160745.67904-7-philmd@linaro.org>
2024-07-23util/fifo8: Rename fifo8_pop_buf() -> fifo8_pop_bufptr()Philippe Mathieu-Daudé
Since fifo8_pop_buf() return a const buffer (which points directly into the FIFO backing store). Rename it using the 'bufptr' suffix to better reflect that it is a pointer to the internal buffer that is being returned. This will help differentiate with methods *copying* the FIFO data. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20240722160745.67904-6-philmd@linaro.org>
2024-07-23util/fifo8: Rename fifo8_peek_buf() -> fifo8_peek_bufptr()Philippe Mathieu-Daudé
Since fifo8_peek_buf() return a const buffer (which points directly into the FIFO backing store). Rename it using the 'bufptr' suffix to better reflect that it is a pointer to the internal buffer that is being returned. This will help differentiate with methods *copying* the FIFO data. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20240722160745.67904-5-philmd@linaro.org>
2024-07-23util/fifo8: Fix stylePhilippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20240722160745.67904-3-philmd@linaro.org>
2024-07-23chardev/char-fe: Document returned value on errorPhilippe Mathieu-Daudé
qemu_chr_fe_add_watch() and qemu_chr_fe_write[_all]() return -1 on error. Mention it in the documentation. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-Id: <20240722160745.67904-2-philmd@linaro.org>
2024-07-23util/range: Make ranges_overlap() return boolYao Xingtao
Just like range_overlaps_range(), use the returned bool value to check whether 2 given ranges overlap. Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20240722040742.11513-2-yaoxt.fnst@fujitsu.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-07-23accel: Restrict probe_access*() functions to TCGPhilippe Mathieu-Daudé
This API is specific to TCG (already handled by hardware accelerators), so restrict it with #ifdef'ry. Remove unnecessary stubs. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240529155918.6221-1-philmd@linaro.org>
2024-07-23vfio/common: Allow disabling device dirty page trackingJoao Martins
The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole tracking of VF pre-copy phase of dirty page tracking, though it means that it will only be used at the start of the switchover phase. Add an option that disables the VF dirty page tracking, and fall back into container-based dirty page tracking. This also allows to use IOMMU dirty tracking even on VFs with their own dirty tracker scheme. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap supportJoao Martins
ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI that fetches the bitmap that tells what was dirty in an IOVA range. A single bitmap is allocated and used across all the hwpts sharing an IOAS which is then used in log_sync() to set Qemu global bitmaps. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking supportJoao Martins
ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that enables or disables dirty page tracking. The ioctl is used if the hwpt has been created with dirty tracking supported domain (stored in hwpt::flags) and it is called on the whole list of iommu domains. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/iommufd: Probe and request hwpt dirty tracking capabilityJoao Martins
In preparation to using the dirty tracking UAPI, probe whether the IOMMU supports dirty tracking. This is done via the data stored in hiod::caps::hw_caps initialized from GET_HW_INFO. Qemu doesn't know if VF dirty tracking is supported when allocating hardware pagetable in iommufd_cdev_autodomains_get(). This is because VFIODevice migration state hasn't been initialized *yet* hence it can't pick between VF dirty tracking vs IOMMU dirty tracking. So, if IOMMU supports dirty tracking it always creates HWPTs with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even if later on VFIOMigration decides to use VF dirty tracking instead. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> [ clg: - Fixed vbasedev->iommu_dirty_tracking assignment in iommufd_cdev_autodomains_get() - Added warning for heterogeneous dirty page tracking support in iommufd_cdev_autodomains_get() ] Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during ↵Joao Martins
attach_device() Move the HostIOMMUDevice::realize() to be invoked during the attach of the device before we allocate IOMMUFD hardware pagetable objects (HWPT). This allows the use of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if the IOMMU behind the device supports dirty tracking. Note: The HostIOMMUDevice data from legacy backend is static and doesn't need any information from the (type1-iommu) backend to be initialized. In contrast however, the IOMMUFD HostIOMMUDevice data requires the iommufd FD to be connected and having a devid to be able to successfully GET_HW_INFO. This means vfio_device_hiod_realize() is called in different places within the backend .attach_device() implementation. Suggested-by: Cédric Le Goater <clg@redhat.cm> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> [ clg: Fixed error handling in iommufd_cdev_attach() ] Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCapsJoao Martins
Store the value of @caps returned by iommufd_backend_get_device_info() in a new field HostIOMMUDeviceCaps::hw_caps. Right now the only value is whether device IOMMU supports dirty tracking (IOMMU_HW_CAP_DIRTY_TRACKING). This is in preparation for HostIOMMUDevice::realize() being called early during attach_device(). Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/{iommufd,container}: Remove caps::aw_bitsJoao Martins
Remove caps::aw_bits which requires the bcontainer::iova_ranges being initialized after device is actually attached. Instead defer that to .get_cap() and call vfio_device_get_aw_bits() directly. This is in preparation for HostIOMMUDevice::realize() being called early during attach_device(). Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/iommufd: Introduce auto domain creationJoao Martins
There's generally two modes of operation for IOMMUFD: 1) The simple user API which intends to perform relatively simple things with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches to VFIO and mainly performs IOAS_MAP and UNMAP. 2) The native IOMMUFD API where you have fine grained control of the IOMMU domain and model it accordingly. This is where most new feature are being steered to. For dirty tracking 2) is required, as it needs to ensure that the stage-2/parent IOMMU domain will only attach devices that support dirty tracking (so far it is all homogeneous in x86, likely not the case for smmuv3). Such invariant on dirty tracking provides a useful guarantee to VMMs that will refuse incompatible device attachments for IOMMU domains. Dirty tracking insurance is enforced via HWPT_ALLOC, which is responsible for creating an IOMMU domain. This is contrast to the 'simple API' where the IOMMU domain is created by IOMMUFD automatically when it attaches to VFIO (usually referred as autodomains) but it has the needed handling for mdevs. To support dirty tracking with the advanced IOMMUFD API, it needs similar logic, where IOMMU domains are created and devices attached to compatible domains. Essentially mimicking kernel iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain it falls back to IOAS attach. The auto domain logic allows different IOMMU domains to be created when DMA dirty tracking is not desired (and VF can provide it), and others where it is. Here it is not used in this way given how VFIODevice migration state is initialized after the device attachment. But such mixed mode of IOMMU dirty tracking + device dirty tracking is an improvement that can be added on. Keep the 'all of nothing' of type1 approach that we have been using so far between container vs device dirty tracking. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> [ clg: Added ERRP_GUARD() in iommufd_cdev_autodomains_get() ] Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW ↵Joao Martins
capabilities The helper will be able to fetch vendor agnostic IOMMU capabilities supported both by hardware and software. Right now it is only iommu dirty tracking. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/pci: Extract mdev check into an helperJoao Martins
In preparation to skip initialization of the HostIOMMUDevice for mdev, extract the checks that validate if a device is an mdev into helpers. A vfio_device_is_mdev() is created, and subsystems consult VFIODevice::mdev to check if it's mdev or not. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23Merge tag 'pull-tcg-20240723' of https://gitlab.com/rth7680/qemu into stagingRichard Henderson
accel/tcg: Export set/clear_helper_retaddr target/arm: Use set_helper_retaddr for dc_zva, sve and sme target/ppc: Tidy dcbz helpers target/ppc: Use set_helper_retaddr for dcbz target/s390x: Use set_helper_retaddr in mem_helper.c # -----BEGIN PGP SIGNATURE----- # # iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmafJKIdHHJpY2hhcmQu # aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV+FBAf7Bup+karxeGHZx2rN # cPeF248bcCWTxBWHK7dsYze4KqzsrlNIJlPeOKErU2bbbRDZGhOp1/N95WVz+P8V # 6Ny63WTsAYkaFWKxE6Jf0FWJlGw92btk75pTV2x/TNZixg7jg0vzVaYkk0lTYc5T # m5e4WycYEbzYm0uodxI09i+wFvpd+7WCnl6xWtlJPWZENukvJ36Ss43egFMDtuMk # vTJuBkS9wpwZ9MSi6EY6M+Raieg8bfaotInZeDvE/yRPNi7CwrA7Dgyc1y626uBA # joGkYRLzhRgvT19kB3bvFZi1AXa0Pxr+j0xJqwspP239Gq5qezlS5Bv/DrHdmGHA # jaqSwg== # =XgUE # -----END PGP SIGNATURE----- # gpg: Signature made Tue 23 Jul 2024 01:33:54 PM AEST # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [ultimate] * tag 'pull-tcg-20240723' of https://gitlab.com/rth7680/qemu: target/riscv: Simplify probing in vext_ldff target/s390x: Use set/clear_helper_retaddr in mem_helper.c target/s390x: Use user_or_likely in access_memmove target/s390x: Use user_or_likely in do_access_memset target/ppc: Improve helper_dcbz for user-only target/ppc: Merge helper_{dcbz,dcbzep} target/ppc: Split out helper_dbczl for 970 target/ppc: Hoist dcbz_size out of dcbz_common target/ppc/mem_helper.c: Remove a conditional from dcbz_common() target/arm: Use set/clear_helper_retaddr in SVE and SME helpers target/arm: Use set/clear_helper_retaddr in helper-a64.c accel/tcg: Move {set,clear}_helper_retaddr to cpu_ldst.h Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-23Merge tag 'nvme-next-pull-request' of https://gitlab.com/birkelund/qemu into ↵Richard Henderson
staging hw/nvme patches # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCgAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmaeiz4ACgkQTeGvMW1P # Dem5DggAkudAwZYUlKLz/FuxmOJsZ/CKL7iIu6wE3P93WTTbi4m2AL5lMFz1bOUH # 33LtjHz51bDvOsnhAwLs2TwjfhICiMJCOXEmxF9zJnO4Yo8ih9UbeE7sEukpxsVr # FJlAg5OXhdIHuo48ow7hu7BqMs58jnXhVA6zSvLU5rbKTSdG/369jyQKy5aoFPN0 # Rk+S6hqDmVMiN7u6E+QqPyB2tSbmNKkhPICu3O9fbHmaOoMFmrcvyxkd1wJ9JxwF # 8MWbuEZlIpLIIL/mCN4wzDw8VKlJ26sBJJC1b+NHmWIWmPkqMeXwcmQtWhUqsrcs # xAGUcjgJuJ3Fu6Xzt+09Y+FXO8v0oQ== # =vCDb # -----END PGP SIGNATURE----- # gpg: Signature made Tue 23 Jul 2024 02:39:26 AM AEST # gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9 # gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown] # gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838 # Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9 * tag 'nvme-next-pull-request' of https://gitlab.com/birkelund/qemu: hw/nvme: remove useless type cast hw/nvme: actually implement abort hw/nvme: add cross namespace copy support hw/nvme: fix memory leak in nvme_dsm Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-23accel/tcg: Move {set,clear}_helper_retaddr to cpu_ldst.hRichard Henderson
Use of these in helpers goes hand-in-hand with tlb_vaddr_to_host and other probing functions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-22hw/nvme: Add SPDM over DOE supportWilfred Mallawa
Setup Data Object Exchange (DOE) as an extended capability for the NVME controller and connect SPDM to it (CMA) to it. Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Klaus Jensen <k.jensen@samsung.com> Message-Id: <20240703092027.644758-4-alistair.francis@wdc.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22backends: Initial support for SPDM socket supportHuai-Cheng Kuo
SPDM enables authentication, attestation and key exchange to assist in providing infrastructure security enablement. It's a standard published by the DMTF [1]. SPDM supports multiple transports, including PCIe DOE and MCTP. This patch adds support to QEMU to connect to an external SPDM instance. SPDM support can be added to any QEMU device by exposing a TCP socket to a SPDM server. The server can then implement the SPDM decoding/encoding support, generally using libspdm [2]. This is similar to how the current TPM implementation works and means that the heavy lifting of setting up certificate chains, capabilities, measurements and complex crypto can be done outside QEMU by a well supported and tested library. 1: https://www.dmtf.org/standards/SPDM 2: https://github.com/DMTF/libspdm Signed-off-by: Huai-Cheng Kuo <hchkuo@avery-design.com.tw> Signed-off-by: Chris Browy <cbrowy@avery-design.com> Co-developed-by: Jonathan Cameron <Jonathan.cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [ Changes by WM - Bug fixes from testing ] Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> [ Changes by AF: - Convert to be more QEMU-ified - Move to backends as it isn't PCIe specific ] Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20240703092027.644758-3-alistair.francis@wdc.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/pci: Add all Data Object Types defined in PCIe r6.0Alistair Francis
Add all of the defined protocols/features from the PCIe-SIG r6.0 "Table 6-32 PCI-SIG defined Data Object Types (Vendor ID = 0001h)" table. Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Message-Id: <20240703092027.644758-2-alistair.francis@wdc.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-iommu: Remove probe_doneEric Auger
Now we have switched to PCIIOMMUOps to convey host IOMMU information, the host reserved regions are transmitted when the PCIe topology is built. This happens way before the virtio-iommu driver calls the probe request. So let's remove the probe_done flag that allowed to check the probe was not done before the IOMMU MR got enabled. Besides this probe_done flag had a flaw wrt migration since it was not saved/restored. The only case at risk is if 2 devices were plugged to a PCIe to PCI bridge and thus aliased. First of all we discovered in the past this case was not properly supported for neither SMMU nor virtio-iommu on guest kernel side: see [RFC] virtio-iommu: Take into account possible aliasing in virtio_iommu_mr() https://lore.kernel.org/all/20230116124709.793084-1-eric.auger@redhat.com/ If this were supported by the guest kernel, it is unclear what the call sequence would be from a virtio-iommu driver point of view. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-3-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22gdbstub: Add helper function to unregister GDB register spaceSalil Mehta
Add common function to help unregister the GDB register space. This shall be done in context to the CPU unrealization. Note: These are common functions exported to arch specific code. For example, for ARM this code is being referred in associated arch specific patch-set: Link: https://lore.kernel.org/qemu-devel/20230926103654.34424-1-salil.mehta@huawei.com/ Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-8-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22physmem: Add helper function to destroy CPU AddressSpaceSalil Mehta
Virtual CPU Hot-unplug leads to unrealization of a CPU object. This also involves destruction of the CPU AddressSpace. Add common function to help destroy the CPU AddressSpace. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-7-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/acpi: Update CPUs AML with cpu-(ctrl)dev changeSalil Mehta
CPUs Control device(\\_SB.PCI0) register interface for the x86 arch is IO port based and existing CPUs AML code assumes _CRS objects would evaluate to a system resource which describes IO Port address. But on ARM arch CPUs control device(\\_SB.PRES) register interface is memory-mapped hence _CRS object should evaluate to system resource which describes memory-mapped base address. Update build CPUs AML function to accept both IO/MEMORY region spaces and accordingly update the _CRS object. Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-6-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/acpi: Update GED _EVT method AML with CPU scanSalil Mehta
OSPM evaluates _EVT method to map the event. The CPU hotplug event eventually results in start of the CPU scan. Scan figures out the CPU and the kind of event(plug/unplug) and notifies it back to the guest. Update the GED AML _EVT method with the call to method \\_SB.CPUS.CSCN (via \\_SB.GED.CSCN) Architecture specific code [1] might initialize its CPUs AML code by calling common function build_cpus_aml() like below for ARM: build_cpus_aml(scope, ms, opts, xx_madt_cpu_entry, memmap[VIRT_CPUHP_ACPI].base, "\\_SB", "\\_SB.GED.CSCN", AML_SYSTEM_MEMORY); [1] https://lore.kernel.org/qemu-devel/20240613233639.202896-13-salil.mehta@huawei.com/ Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-5-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/acpi: Update ACPI GED framework to support vCPU HotplugSalil Mehta
ACPI GED (as described in the ACPI 6.4 spec) uses an interrupt listed in the _CRS object of GED to intimate OSPM about an event. Later then demultiplexes the notified event by evaluating ACPI _EVT method to know the type of event. Use ACPI GED to also notify the guest kernel about any CPU hot(un)plug events. Note, GED interface is used by many hotplug events like memory hotplug, NVDIMM hotplug and non-hotplug events like system power down event. Each of these can be selected using a bit in the 32 bit GED IO interface. A bit has been reserved for the CPU hotplug event. ACPI CPU hotplug related initialization should only happen if ACPI_CPU_HOTPLUG support has been enabled for particular architecture. Add cpu_hotplug_hw_init() stub to avoid compilation break. Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20240716111502.202344-4-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Igor Mammedov <imammedo@redhat.com>
2024-07-22hw/acpi: Move CPU ctrl-dev MMIO region len macro to common header fileSalil Mehta
CPU ctrl-dev MMIO region length could be used in ACPI GED and various other architecture specific places. Move ACPI_CPU_HOTPLUG_REG_LEN macro to more appropriate common header file. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-3-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22accel/kvm: Extract common KVM vCPU {creation,parking} codeSalil Mehta
KVM vCPU creation is done once during the vCPU realization when Qemu vCPU thread is spawned. This is common to all the architectures as of now. Hot-unplug of vCPU results in destruction of the vCPU object in QOM but the corresponding KVM vCPU object in the Host KVM is not destroyed as KVM doesn't support vCPU removal. Therefore, its representative KVM vCPU object/context in Qemu is parked. Refactor architecture common logic so that some APIs could be reused by vCPU Hotplug code of some architectures likes ARM, Loongson etc. Update new/old APIs with trace events. New APIs qemu_{create,park,unpark}_vcpu() can be externally called. No functional change is intended here. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-2-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22smbios: make memory device size configurable per MachineIgor Mammedov
Currently QEMU describes initial[1] RAM* in SMBIOS as a series of virtual DIMMs (capped at 16Gb max) using type 17 structure entries. Which is fine for the most cases. However when starting guest with terabytes of RAM this leads to too many memory device structures, which eventually upsets linux kernel as it reserves only 64K for these entries and when that border is crossed out it runs out of reserved memory. Instead of partitioning initial RAM on 16Gb DIMMs, use maximum possible chunk size that SMBIOS spec allows[2]. Which lets encode RAM in lower 31 bits of 32bit field (which amounts upto 2047Tb per DIMM). As result initial RAM will generate only one type 17 structure until host/guest reach ability to use more RAM in the future. Compat changes: We can't unconditionally change chunk size as it will break QEMU<->guest ABI (and migration). Thus introduce a new machine class field that would let older versioned machines to use legacy 16Gb chunks, while new(er) machine type[s] use maximum possible chunk size. PS: While it might seem to be risky to rise max entry size this large (much beyond of what current physical RAM modules support), I'd not expect it causing much issues, modulo uncovering bugs in software running within guest. And those should be fixed on guest side to handle SMBIOS spec properly, especially if guest is expected to support so huge RAM configs. In worst case, QEMU can reduce chunk size later if we would care enough about introducing a workaround for some 'unfixable' guest OS, either by fixing up the next machine type or giving users a CLI option to customize it. 1) Initial RAM - is RAM configured with help '-m SIZE' CLI option/ implicitly defined by machine. It doesn't include memory configured with help of '-device' option[s] (pcdimm,nvdimm,...) 2) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size PS: * tested on 8Tb host with RHEL6 guest, which seems to parse type 17 SMBIOS table entries correctly (according to 'dmidecode'). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240715122417.4059293-1-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-pci: Implement SR-IOV PFAkihiko Odaki
Allow user to attach SR-IOV VF to a virtio-pci PF. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-6-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22pcie_sriov: Allow user to create SR-IOV deviceAkihiko Odaki
A user can create a SR-IOV device by specifying the PF with the sriov-pf property of the VFs. The VFs must be added before the PF. A user-creatable VF must have PCIDeviceClass::sriov_vf_user_creatable set. Such a VF cannot refer to the PF because it is created before the PF. A PF that user-creatable VFs can be attached calls pcie_sriov_pf_init_from_user_created_vfs() during realization and pcie_sriov_pf_exit() when exiting. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-5-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22Add support for RAPL MSRs in KVM/QemuAnthony Harivel
Starting with the "Sandy Bridge" generation, Intel CPUs provide a RAPL interface (Running Average Power Limit) for advertising the accumulated energy consumption of various power domains (e.g. CPU packages, DRAM, etc.). The consumption is reported via MSRs (model specific registers) like MSR_PKG_ENERGY_STATUS for the CPU package power domain. These MSRs are 64 bits registers that represent the accumulated energy consumption in micro Joules. They are updated by microcode every ~1ms. For now, KVM always returns 0 when the guest requests the value of these MSRs. Use the KVM MSR filtering mechanism to allow QEMU handle these MSRs dynamically in userspace. To limit the amount of system calls for every MSR call, create a new thread in QEMU that updates the "virtual" MSR values asynchronously. Each vCPU has its own vMSR to reflect the independence of vCPUs. The thread updates the vMSR values with the ratio of energy consumed of the whole physical CPU package the vCPU thread runs on and the thread's utime and stime values. All other non-vCPU threads are also taken into account. Their energy consumption is evenly distributed among all vCPUs threads running on the same physical CPU package. To overcome the problem that reading the RAPL MSR requires priviliged access, a socket communication between QEMU and the qemu-vmsr-helper is mandatory. You can specified the socket path in the parameter. This feature is activated with -accel kvm,rapl=true,path=/path/sock.sock Actual limitation: - Works only on Intel host CPU because AMD CPUs are using different MSR adresses. - Only the Package Power-Plane (MSR_PKG_ENERGY_STATUS) is reported at the moment. Signed-off-by: Anthony Harivel <aharivel@redhat.com> Link: https://lore.kernel.org/r/20240522153453.1230389-4-aharivel@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hw/nvme: add cross namespace copy supportArun Kumar
Extend copy command to copy user data across different namespaces via support for specifying a namespace for each source range Signed-off-by: Arun Kumar <arun.kka@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-07-22qio: add support for SO_PEERCRED for socket channelAnthony Harivel
The function qio_channel_get_peercred() returns a pointer to the credentials of the peer process connected to this socket. This credentials structure is defined in <sys/socket.h> as follows: struct ucred { pid_t pid; /* Process ID of the sending process */ uid_t uid; /* User ID of the sending process */ gid_t gid; /* Group ID of the sending process */ }; The use of this function is possible only for connected AF_UNIX stream sockets and for AF_UNIX stream and datagram socket pairs. On platform other than Linux, the function return 0. Signed-off-by: Anthony Harivel <aharivel@redhat.com> Link: https://lore.kernel.org/r/20240522153453.1230389-2-aharivel@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>