aboutsummaryrefslogtreecommitdiff
path: root/include/hw
AgeCommit message (Collapse)Author
2023-06-30vfio/migration: Make VFIO migration non-experimentalAvihai Horon
The major parts of VFIO migration are supported today in QEMU. This includes basic VFIO migration, device dirty page tracking and precopy support. Thus, at this point in time, it seems appropriate to make VFIO migration non-experimental: remove the x prefix from enable_migration property, change it to ON_OFF_AUTO and let the default value be AUTO. In addition, make the following adjustments: 1. When enable_migration is ON and migration is not supported, fail VFIO device realization. 2. When enable_migration is AUTO (i.e., not explicitly enabled), require device dirty tracking support. This is because device dirty tracking is currently the only method to do dirty page tracking, which is essential for migrating in a reasonable downtime. Setting enable_migration to ON will not require device dirty tracking. 3. Make migration error and blocker messages more elaborate. 4. Remove error prints in vfio_migration_query_flags(). 5. Rename trace_vfio_migration_probe() to trace_vfio_migration_realize(). Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30vfio/migration: Reset bytes_transferred properlyAvihai Horon
Currently, VFIO bytes_transferred is not reset properly: 1. bytes_transferred is not reset after a VM snapshot (so a migration following a snapshot will report incorrect value). 2. bytes_transferred is a single counter for all VFIO devices, however upon migration failure it is reset multiple times, by each VFIO device. Fix it by introducing a new function vfio_reset_bytes_transferred() and calling it during migration and snapshot start. Remove existing bytes_transferred reset in VFIO migration state notifier, which is not needed anymore. Fixes: 3710586caa5d ("qapi: Add VFIO devices migration stats in Migration stats") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30vfio: Implement a common device info helperAlex Williamson
A common helper implementing the realloc algorithm for handling capabilities. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Robin Voetter <robin@streamhpc.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30vfio/migration: Add support for switchover ack capabilityAvihai Horon
Loading of a VFIO device's data can take a substantial amount of time as the device may need to allocate resources, prepare internal data structures, etc. This can increase migration downtime, especially for VFIO devices with a lot of resources. To solve this, VFIO migration uAPI defines "initial bytes" as part of its precopy data stream. Initial bytes can be used in various ways to improve VFIO migration performance. For example, it can be used to transfer device metadata to pre-allocate resources in the destination. However, for this to work we need to make sure that all initial bytes are sent and loaded in the destination before the source VM is stopped. Use migration switchover ack capability to make sure a VFIO device's initial bytes are sent and loaded in the destination before the source stops the VM and attempts to complete the migration. This can significantly reduce migration downtime for some devices. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30vfio/migration: Add VFIO migration pre-copy supportAvihai Horon
Pre-copy support allows the VFIO device data to be transferred while the VM is running. This helps to accommodate VFIO devices that have a large amount of data that needs to be transferred, and it can reduce migration downtime. Pre-copy support is optional in VFIO migration protocol v2. Implement pre-copy of VFIO migration protocol v2 and use it for devices that support it. Full description of it can be found in the following Linux commit: 4db52602a607 ("vfio: Extend the device migration protocol with PRE_COPY"). Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30vfio/migration: Store VFIO migration flags in VFIOMigrationAvihai Horon
VFIO migration flags are queried once in vfio_migration_init(). Store them in VFIOMigration so they can be used later to check the device's migration capabilities without re-querying them. This will be used in the next patch to check if the device supports precopy migration. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-29Merge tag 'accel-20230628' of https://github.com/philmd/qemu into stagingRichard Henderson
Accelerators patches - MAINTAINERS: Update Roman Bolshakov email address - HAX: Fix a memory leak - HAX/NVMM/WHPX/HVF: Rename per-accel state as AccelCPUState - KVM: Restrict specific fields from ArchCPU - WHPX: Re-enable cross-build gitlab-ci job on case sensitive filesystems - WHPX: Fix error message when setting ProcessorCount fails - exec/memory: Add definitions for memory listener priorities # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmScVtkACgkQ4+MsLN6t # wN7p8A//RXuX9gLFT35zx+5axocU3/XBbCsQWSvzzkYoXxmC/TLxvivO66NPGMc0 # C76b1FJUoLS/u9SyJUeIeYkL0rjkzARUKcRpiJXM21WM6ou8Nkz0kuI4ouowt+4K # i/4chTjxlN5/4PKlHHcX9ZUJ9acVj01zO1BCuj/bVsxO6WMT1kjL+kplVxxFR3aW # tlbYtUT3v4xmp94FfE2Q9lR25z4usrGnmz2rchaadlVc43kmsNcQRx+EoUdi148n # lkViRR90sacYPX586s2yxhPpUdtrXjJmEdX0X00urdPqljkRxekHtyTqG4CRZi+K # hG5NztK7p37GNNXZroL0gpHyr9IX6hZ3o8rmN3IiCOGU6BgQBRUhvvG2sblwcJ1A # SSiBK4RWtgyIGWt4U6PgVj8IAu55JuqT5xR2r34fH/zccxXlp/B13vadGs7TUK15 # oHDUT4GnKL2R29lVFTl95BzsxwaMtbB9w01CLJk8va2T/97eqtFgvJyuVC9vZb0N # 41u2RkinaQZ+hbq9TP1G21zpG0eyucEMIQ6loUd7+G3KJFjFfB4JzE2VDm0Y/OVy # 77cEEQ67wts29fMNSqqPIQCMttDrNj7JqMMknGBQS2iHPgF+B3KjwIjnRaMBt73I # CKPITOJPmb+kvIUsK3KlONdicEG57cBxFBTZW5+P9pJXF5izrAY= # =b9hj # -----END PGP SIGNATURE----- # gpg: Signature made Wed 28 Jun 2023 05:50:49 PM CEST # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] * tag 'accel-20230628' of https://github.com/philmd/qemu: (30 commits) exec/memory: Add symbol for the min value of memory listener priority exec/memory: Add symbol for memory listener priority for device backend exec/memory: Add symbolic value for memory listener priority for accel target/i386/WHPX: Fix error message when fail to set ProcessorCount target/riscv: Restrict KVM-specific fields from ArchCPU target/ppc: Restrict KVM-specific fields from ArchCPU target/arm: Restrict KVM-specific fields from ArchCPU hw/arm/sbsa-ref: Include missing 'sysemu/kvm.h' header hw/intc/arm_gic: Rename 'first_cpu' argument hw/intc/arm_gic: Un-inline GIC*/ITS class_name() helpers accel/kvm: Declare kvm_direct_msi_allowed in stubs accel/kvm: Re-include "exec/memattrs.h" header accel: Rename HVF 'struct hvf_vcpu_state' -> AccelCPUState accel: Rename 'cpu_state' -> 'cs' accel: Inline WHPX get_whpx_vcpu() accel: Rename WHPX 'struct whpx_vcpu' -> AccelCPUState accel: Remove WHPX unreachable error path accel: Inline NVMM get_qemu_vcpu() accel: Rename NVMM 'struct qemu_vcpu' -> AccelCPUState accel: Remove NVMM unreachable error path ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-28hw/intc/arm_gic: Un-inline GIC*/ITS class_name() helpersPhilippe Mathieu-Daudé
"kvm_arm.h" contains external and internal prototype declarations. Files under the hw/ directory should only access the KVM external API. In order to avoid machine / device models to include "kvm_arm.h" simply to get the QOM GIC/ITS class name, un-inline each class name getter to the proper device model file. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230405160454.97436-4-philmd@linaro.org>
2023-06-28accel: Rename HVF 'struct hvf_vcpu_state' -> AccelCPUStatePhilippe Mathieu-Daudé
We want all accelerators to share the same opaque pointer in CPUState. Rename the 'hvf_vcpu_state' structure as 'AccelCPUState'. Use the generic 'accel' field of CPUState instead of 'hvf'. Replace g_malloc0() by g_new0() for readability. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20230624174121.11508-17-philmd@linaro.org>
2023-06-28accel: Move HAX hThread to accelerator contextPhilippe Mathieu-Daudé
hThread variable is only used by the HAX accelerator, so move it to the accelerator specific context. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230624174121.11508-9-philmd@linaro.org>
2023-06-28accel: Rename HAX 'struct hax_vcpu_state' -> AccelCPUStatePhilippe Mathieu-Daudé
We want all accelerators to share the same opaque pointer in CPUState. Start with the HAX context, renaming its forward declarated structure 'hax_vcpu_state' as 'AccelCPUState'. Document the CPUState field. Directly use the typedef. Remove the amusing but now unnecessary casts in NVMM / WHPX. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230624174121.11508-8-philmd@linaro.org>
2023-06-28accel: Rename 'hax_vcpu' as 'accel' in CPUStatePhilippe Mathieu-Daudé
All accelerators will share a single opaque context in CPUState. Start by renaming 'hax_vcpu' as 'accel'. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230624174121.11508-7-philmd@linaro.org>
2023-06-28escc: emulate dip switch language layout settings on SUN keyboardHenrik Carlqvist
SUN Type 4, 5 and 5c keyboards have dip switches to choose the language layout of the keyboard. Solaris makes an ioctl to query the value of the dipswitches and uses that value to select keyboard layout. Also the SUN bios like the one in the file ss5.bin uses this value to support at least some keyboard layouts. However, the OpenBIOS provided with qemu is hardcoded to always use an US keyboard layout. Before this patch, qemu allways gave dip switch value 0x21 (US keyboard), this patch uses a command line switch like "-global escc.chnA-sunkbd-layout=de" to select dip switch value. A table is used to lookup values from arguments like: -global escc.chnA-sunkbd-layout=fr -global escc.chnA-sunkbd-layout=es But the patch also accepts numeric dip switch values directly: -global escc.chnA-sunkbd-layout=0x2b -global escc.chnA-sunkbd-layout=43 Both values above are the same and select swedish keyboard as explained in table 3-15 at https://docs.oracle.com/cd/E19683-01/806-6642/new-43/index.html Unless you want to do a full Solaris installation but happen to have access to a Sun bios file, the easiest way to test that the patch works is to: qemu-system-sparc -global escc.chnA-sunkbd-layout=sv -bios /path/to/ss5.bin If you already happen to have a Solaris installation in a qemu disk image file you can easily try different keyboard layouts after this patch is applied. Signed-off-by: Henrik Carlqvist <hc1245@poolhem.se> Message-Id: <20230623203007.56d3d182.hc981@poolhem.se> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [MCA edit: update unsigned char to uint8_t, fix spacing issues] Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2023-06-27virtio-gpu/win32: allocate shareable 2d resources/imagesMarc-André Lureau
Allocate pixman bits for scanouts with qemu_win32_map_alloc() so we can set a shareable handle on the associated display surface. Note: when bits are provided to pixman_image_create_bits(), you must also give the rowstride (the argument is ignored when bits is NULL) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20230606115658.677673-11-marcandre.lureau@redhat.com>
2023-06-26accel/tcg: Store some tlb flags in CPUTLBEntryFullRichard Henderson
We have run out of bits we can use within the CPUTLBEntry comparators, as TLB_FLAGS_MASK cannot overlap alignment. Store slow_flags[] in CPUTLBEntryFull, and merge with the flags from the comparator. A new TLB_FORCE_SLOW bit is set within the comparator as an indication that the slow path must be used. Move TLB_BSWAP to TLB_SLOW_FLAGS_MASK. Since we are out of bits, we cannot create a new bit without moving an old one. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-26Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Richard Henderson
into staging virtio,pc,pci: fixes, features, cleanups asymmetric crypto support for cryptodev-vhost-user rom migration when rom size changes poison get, inject, clear; mock cxl events and irq support for cxl shadow virtqueue offload support for vhost-vdpa vdpa now maps shadow vrings with MAP_SHARED max_cpus went up to 1024 and we default to smbios 3.0 for pc Fixes, cleanups all over the place. In particular hw/acpi: Fix PM control register access works around a very long standing bug in memory core. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmSZl5EPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRph+8H/RZodqCadmQ1evpeWs7RBSvJeZgbJTVl/9/h # +ObvEmVz2+X4D+O1Kxh54vDV0SNVq3XjyrFy3Ur57MAR6r2ZWwB6HySaeFdi4zIm # N0SMkfUylDnf7ulyjzJoXDzHOoFnqAM6fU/jcoQXBIdUeeqwPrzLOZHrGrwevPWK # iH5JP66suOVlBuKLJjlUKI3/4vK3oTod5Xa3Oz2Cw1oODtbIa97N8ZAdBgZd3ah9 # 7mjZjcH54kFRwfidz/rkpY5NMru8BlD54MyEOWofvTL2w7aoWmVO99qHEK+SjLkG # x4Mx3aYlnOEvkJ+5yBHvtXS4Gc5T9ltY84AvcwPNuz4RKCORi1s= # =Do8p # -----END PGP SIGNATURE----- # gpg: Signature made Mon 26 Jun 2023 03:50:09 PM CEST # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [undefined] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [undefined] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (53 commits) vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present vhost_net: add an assertion for TAP client backends intel_iommu: Fix address space unmap intel_iommu: Fix flag check in replay intel_iommu: Fix a potential issue in VFIO dirty page sync vhost-user: fully use new backend/frontend naming virtio-scsi: avoid dangling host notifier in ->ioeventfd_stop() hw/i386/pc: Clean up pc_machine_initfn vdpa: fix not using CVQ buffer in case of error vdpa: mask _F_CTRL_GUEST_OFFLOADS for vhost vdpa devices vhost: fix vhost_dev_enable_notifiers() error case vdpa: Allow VIRTIO_NET_F_CTRL_GUEST_OFFLOADS in SVQ vdpa: Add vhost_vdpa_net_load_offloads() virtio-net: expose virtio_net_supported_guest_offloads() hw/net/virtio-net: make some VirtIONet const vdpa: reuse virtio_vdev_has_feature() include/hw/virtio: make some VirtIODevice const vdpa: map shadow vrings with MAP_SHARED vdpa: reorder vhost_vdpa_net_cvq_cmd_page_len function vdpa: do not block migration if device has cvq and x-svq=on ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-26vhost-user: fully use new backend/frontend namingManos Pitsidianakis
Slave/master nomenclature was replaced with backend/frontend in commit 1fc19b65279a ("vhost-user: Adopt new backend naming") This patch replaces all remaining uses of master and slave in the codebase. Signed-off-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org> Message-Id: <20230613080849.2115347-1-manos.pitsidianakis@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2023-06-26virtio-net: expose virtio_net_supported_guest_offloads()Hawkins Jiawei
To support restoring offloads state in vdpa, it is necessary to expose the function virtio_net_supported_guest_offloads(). According to VirtIO standard, "Upon feature negotiation corresponding offload gets enabled to preserve backward compatibility.". Therefore, QEMU uses this function to get the device supported offloads. This allows QEMU to know the device's defaults and skip the control message sending if these defaults align with the driver's configuration. Note that the device's defaults can mismatch the driver's configuration only at live migration. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <43679506f3f039a7aa2bdd5b49785107b5dfd7d4.1685704856.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Tested-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-26include/hw/virtio: make some VirtIODevice constHawkins Jiawei
The VirtIODevice structure is not modified in virtio_vdev_has_feature(). Therefore, make it const to allow this function to accept const variables. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Reviewed-by: Eugenio Pérez Martin <eperezma@redhat.com> Message-Id: <16c0561b921310a32c240a4fb6e8cee3ffee16fe.1685704856.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Tested-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-26hw/i386/pc: Default to use SMBIOS 3.0 for newer machine modelsSuravee Suthikulpanit
Currently, pc-q35 and pc-i44fx machine models are default to use SMBIOS 2.8 (32-bit entry point). Since SMBIOS 3.0 (64-bit entry point) is now fully supported since QEMU 7.0, default to use SMBIOS 3.0 for newer machine models. This is necessary to avoid the following message when launching a VM with large number of vcpus. "SMBIOS 2.1 table length 66822 exceeds 65535" Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20230607205717.737749-2-suravee.suthikulpanit@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2023-06-26Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into stagingRichard Henderson
* kvm: reuse per-vcpu stats fd to avoid vcpu interruption * Validate cluster and NUMA node boundary on ARM and RISC-V * various small TCG features from newer processors * Remove dubious 'event_notifier-posix.c' include * fix git-submodule.sh in releases # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmSZS0IUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroN+tgf/axJdG9NXKCyXgc0vzjKVhSR4Y+tC # EPxkg7Rq7uOMgbph9oTS/2Kzh9LnP6kLt2qnS4igRHGuEBd58yD6fFNDv0LJsK/l # B/d0WGHMKV0KMYOX24rkyfohVu37GhVRsiVSIlIiQVTC9JtYer7WxdnyoDaPKvY8 # dpbKgDrd59vAlsHrpj7ZubVQPcL3lXrLryimpDohMH6Ba+4wZq+7dKPpal97QOP2 # 3i7isUBTQiMOcVjW6GEiNcDLSJqj5DSgylhdFnaBsq/ThpC2PxWoXcCbV28QELzf # 5+J+RXQavmeWKZMR0q98iBzWbrsVtaSxAkHHiwbUMMqQvkfY6Dpo5dMHWw== # =WHE2 # -----END PGP SIGNATURE----- # gpg: Signature made Mon 26 Jun 2023 10:24:34 AM CEST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [undefined] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [undefined] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: git-submodule.sh: allow running in validate mode without previous update target/i386: implement SYSCALL/SYSRET in 32-bit emulators target/i386: implement RDPID in TCG target/i386: sysret and sysexit are privileged target/i386: AMD only supports SYSENTER/SYSEXIT in 32-bit mode target/i386: Intel only supports SYSCALL/SYSRET in long mode target/i386: TCG supports WBNOINVD target/i386: TCG supports XSAVEERPTR target/i386: do not accept RDSEED if CPUID bit absent target/i386: TCG supports RDSEED target/i386: TCG supports 3DNow! prefetch(w) target/i386: fix INVD vmexit kvm: reuse per-vcpu stats fd to avoid vcpu interruption hw/riscv: Validate cluster and NUMA node boundary hw/arm: Validate cluster and NUMA node boundary numa: Validate cluster and NUMA node boundary if required hw/remote/proxy: Remove dubious 'event_notifier-posix.c' include build: further refine build.ninja rules Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-26kvm: reuse per-vcpu stats fd to avoid vcpu interruptionMarcelo Tosatti
A regression has been detected in latency testing of KVM guests. More specifically, it was observed that the cyclictest numbers inside of an isolated vcpu (running on isolated pcpu) are: Where a maximum of 50us is acceptable. The implementation of KVM_GET_STATS_FD uses run_on_cpu to query per vcpu statistics, which interrupts the vcpu (and is unnecessary). To fix this, open the per vcpu stats fd on vcpu initialization, and read from that fd from QEMU's main thread. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-06-26numa: Validate cluster and NUMA node boundary if requiredGavin Shan
For some architectures like ARM64, multiple CPUs in one cluster can be associated with different NUMA nodes, which is irregular configuration because we shouldn't have this in baremetal environment. The irregular configuration causes Linux guest to misbehave, as the following warning messages indicate. -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \ -numa node,nodeid=0,cpus=0-1,memdev=ram0 \ -numa node,nodeid=1,cpus=2-3,memdev=ram1 \ -numa node,nodeid=2,cpus=4-5,memdev=ram2 \ ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 build_sched_domains+0x284/0x910 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1 pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : build_sched_domains+0x284/0x910 lr : build_sched_domains+0x184/0x910 sp : ffff80000804bd50 x29: ffff80000804bd50 x28: 0000000000000002 x27: 0000000000000000 x26: ffff800009cf9a80 x25: 0000000000000000 x24: ffff800009cbf840 x23: ffff000080325000 x22: ffff0000005df800 x21: ffff80000a4ce508 x20: 0000000000000000 x19: ffff000080324440 x18: 0000000000000014 x17: 00000000388925c0 x16: 000000005386a066 x15: 000000009c10cc2e x14: 00000000000001c0 x13: 0000000000000001 x12: ffff00007fffb1a0 x11: ffff00007fffb180 x10: ffff80000a4ce508 x9 : 0000000000000041 x8 : ffff80000a4ce500 x7 : ffff80000a4cf920 x6 : 0000000000000001 x5 : 0000000000000001 x4 : 0000000000000007 x3 : 0000000000000002 x2 : 0000000000001000 x1 : ffff80000a4cf928 x0 : 0000000000000001 Call trace: build_sched_domains+0x284/0x910 sched_init_domains+0xac/0xe0 sched_init_smp+0x48/0xc8 kernel_init_freeable+0x140/0x1ac kernel_init+0x28/0x140 ret_from_fork+0x10/0x20 Improve the situation to warn when multiple CPUs in one cluster have been associated with different NUMA nodes. However, one NUMA node is allowed to be associated with different clusters. Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230509002739.18388-2-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-06-25pnv/xive2: Add a get_config() method on the presenter classFrederic Barrat
The presenters for xive on P9 and P10 are mostly similar but the behavior can be tuned through a few CQ registers. This patch adds a "get_config" method, which will allow to access that config from the presenter in a later patch. For now, just define the config for the TIMA version. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-06-25target/ppc: Add msgsnd/p and DPDES SMT supportNicholas Piggin
Doorbells in SMT need to coordinate msgsnd/msgclr and DPDES access from multiple threads that affect the same state. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-06-25ppc/spapr: Move spapr nested HV to a new fileNicholas Piggin
Create spapr_nested.c for most of the nested HV implementation. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-06-25ppc/spapr: Add a nested state structNicholas Piggin
Rather than use a copy of CPUPPCState to store the host state while the environment has been switched to the L2, use a new struct for this purpose. Have helper functions to save and load this host state. Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-06-23hw/cxl/events: Add injection of Memory Module EventsJonathan Cameron
These events include a copy of the device health information at the time of the event. Actually using the emulated device health would require a lot of controls to manipulate that state. Given the aim of this injection code is to just test the flows when events occur, inject the contents of the device health state as well. Future work may add more sophisticate device health emulation including direct generation of these records when events occur (such as a temperature threshold being crossed). That does not reduce the usefulness of this more basic generation of the events. Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230530133603.16934-8-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-22hw/cxl/events: Add injection of DRAM eventsJonathan Cameron
Defined in CXL r3.0 8.2.9.2.1.2 DRAM Event Record, this event provides information related to DRAM devices. Example injection command in QMP: { "execute": "cxl-inject-dram-event", "arguments": { "path": "/machine/peripheral/cxl-mem0", "log": "informational", "flags": 1, "dpa": 1000, "descriptor": 3, "type": 3, "transaction-type": 192, "channel": 3, "rank": 17, "nibble-mask": 37421234, "bank-group": 7, "bank": 11, "row": 2, "column": 77, "correction-mask": [33, 44, 55,66] }} Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230530133603.16934-7-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-22hw/cxl/events: Add injection of General Media EventsIra Weiny
To facilitate testing provide a QMP command to inject a general media event. The event can be added to the log specified. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230530133603.16934-6-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-22hw/cxl/events: Add event interrupt supportIra Weiny
Replace the stubbed out CXL Get/Set Event interrupt policy mailbox commands. Enable those commands to control interrupts for each of the event log types. Skip the standard input mailbox length on the Set command due to DCD being optional. Perform the checks separately. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230530133603.16934-5-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-22hw/cxl/events: Wire up get/clear event mailbox commandsIra Weiny
CXL testing is benefited from an artificial event log injection mechanism. Add an event log infrastructure to insert, get, and clear events from the various logs available on a device. Replace the stubbed out CXL Get/Clear Event mailbox commands with commands that operate on the new infrastructure. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230530133603.16934-4-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-22hw/cxl: Move CXLRetCode definition to cxl_device.hJonathan Cameron
Following patches will need access to the mailbox return code type so move it to the header. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230530133603.16934-3-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-22hw/cxl/events: Add event status registerIra Weiny
The device status register block was defined. However, there were no individual registers nor any data wired up. Define the event status register [CXL 3.0; 8.2.8.3.1] as part of the device status register block. Wire up the register and initialize the event status for each log. To support CXL 3.0 the version of the device status register block needs to be 2. Change the macro to allow for setting the version. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230530133603.16934-2-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-22hw/cxl: Add clear poison mailbox command support.Jonathan Cameron
Current implementation is very simple so many of the corner cases do not exist (e.g. fragmenting larger poison list entries) Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230526170010.574-5-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-22hw/cxl: QMP based poison injection supportJonathan Cameron
Inject poison using QMP command cxl-inject-poison to add an entry to the poison list. For now, the poison is not returned CXL.mem reads, but only via the mailbox command Get Poison List. So a normal memory read to an address that is on the poison list will not yet result in a synchronous exception (and similar for partial cacheline writes). That is left for a future patch. See CXL rev 3.0, sec 8.2.9.8.4.1 Get Poison list (Opcode 4300h) Kernel patches to use this interface here: https://lore.kernel.org/linux-cxl/cover.1665606782.git.alison.schofield@intel.com/ To inject poison using QMP (telnet to the QMP port) { "execute": "qmp_capabilities" } { "execute": "cxl-inject-poison", "arguments": { "path": "/machine/peripheral/cxl-pmem0", "start": 2048, "length": 256 } } Adjusted to select a device on your machine. Note that the poison list supported is kept short enough to avoid the complexity of state machine that is needed to handle the MORE flag. Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230526170010.574-3-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-22q800: move macfb device to Q800MachineStateMark Cave-Ayland
Also change the instantiation of the macfb device to use object_initialize_child(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20230621085353.113233-23-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-06-22q800: move mac-nubus-bridge device to Q800MachineStateMark Cave-Ayland
Also change the instantiation of the mac-nubus-bridge device to use object_initialize_child() and map the Nubus address space using memory_region_add_subregion() instead of sysbus_mmio_map(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230621085353.113233-21-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-06-22q800: move SWIM device to Q800MachineStateMark Cave-Ayland
Also change the instantiation of the SWIM device to use object_initialize_child(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20230621085353.113233-20-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-06-22q800: move ESP device to Q800MachineStateMark Cave-Ayland
Also change the instantiation of the ESP device to use object_initialize_child(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20230621085353.113233-19-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-06-22q800: move escc_orgate device to Q800MachineStateMark Cave-Ayland
Also change the instantiation of the escc_orgate device to use object_initialize_child(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20230621085353.113233-18-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-06-22q800: move ESCC device to Q800MachineStateMark Cave-Ayland
Also change the instantiation of the ESCC device to use object_initialize_child(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20230621085353.113233-17-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-06-22q800: move dp8393x device to Q800MachineStateMark Cave-Ayland
Also change the instantiation of the dp8393x device to use object_initialize_child(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> CC: Jason Wang <jasowang@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20230621085353.113233-16-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-06-22hw/net/dp8393x.c: move TYPE_DP8393X and dp8393xState into dp8393x.hMark Cave-Ayland
This is to enable them to be used outside of dp8393x.c. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> CC: Jason Wang <jasowang@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230621085353.113233-15-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-06-22q800: move VIA2 device to Q800MachineStateMark Cave-Ayland
Also change the instantiation of the VIA2 device to use object_initialize_child(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20230621085353.113233-14-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-06-22q800: move VIA1 device to Q800MachineStateMark Cave-Ayland
Also change the instantiation of the VIA1 device to use object_initialize_child(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20230621085353.113233-13-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-06-22q800: reimplement mac-io region aliasing using IO memory regionMark Cave-Ayland
The current use of aliased memory regions causes us 2 problems: firstly the output of "info qom-tree" is absolutely huge and difficult to read, and secondly we have already reached the internal limit for memory regions as adding any new memory region into the mac-io region causes QEMU to assert with "phys_section_add: Assertion `map->sections_nb < TARGET_PAGE_SIZE' failed". Implement the mac-io region aliasing using a single IO memory region that applies IO_SLICE_MASK representing the maximum size of the aliased region and then forwarding the access to the existing mac-io memory region using the address space API. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20230621085353.113233-12-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-06-22q800: introduce mac-io container memory regionMark Cave-Ayland
Move all devices from the IO region to within the container in preparation for updating the IO aliasing mechanism. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230621085353.113233-11-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-06-22q800: move GLUE device to Q800MachineStateMark Cave-Ayland
Also change the instantiation of the GLUE device to use object_initialize_child(). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20230621085353.113233-10-mark.cave-ayland@ilande.co.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-06-22q800: move GLUE device into separate q800-glue.c fileMark Cave-Ayland
This will allow the q800-glue.h header to be included separately so that the GLUE device can be referenced externally. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230621085353.113233-8-mark.cave-ayland@ilande.co.uk> [lv: update comment] Signed-off-by: Laurent Vivier <laurent@vivier.eu>