aboutsummaryrefslogtreecommitdiff
path: root/hw/i386
AgeCommit message (Collapse)Author
2023-03-06xen/pt: reserve PCI slot 2 for Intel igd-passthruChuck Zmudzinski
Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus, as noted in docs/igd-assign.txt in the Qemu source code. Currently, when the xl toolstack is used to configure a Xen HVM guest with Intel IGD passthrough to the guest with the Qemu upstream device model, a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy a different slot. This problem often prevents the guest from booting. The only available workarounds are not good: Configure Xen HVM guests to use the old and no longer maintained Qemu traditional device model available from xenbits.xen.org which does reserve slot 2 for the Intel IGD or use the "pc" machine type instead of the "xenfv" machine type and add the xen platform device at slot 3 using a command line option instead of patching qemu to fix the "xenfv" machine type directly. The second workaround causes some degredation in startup performance such as a longer boot time and reduced resolution of the grub menu that is displayed on the monitor. This patch avoids that reduced startup performance when using the Qemu upstream device model for Xen HVM guests configured with the igd-passthru=on option. To implement this feature in the Qemu upstream device model for Xen HVM guests, introduce the following new functions, types, and macros: * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS * typedef XenPTQdevRealize function pointer * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2 * xen_igd_reserve_slot and xen_igd_clear_slot functions Michael Tsirkin: * Introduce XEN_PCI_IGD_DOMAIN, XEN_PCI_IGD_BUS, XEN_PCI_IGD_DEV, and XEN_PCI_IGD_FN - use them to compute the value of XEN_PCI_IGD_SLOT_MASK The new xen_igd_reserve_slot function uses the existing slot_reserved_mask member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using the xl toolstack with the gfx_passthru option enabled, which sets the igd-passthru=on option to Qemu for the Xen HVM machine type. The new xen_igd_reserve_slot function also needs to be implemented in hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough, in which case it does nothing. The new xen_igd_clear_slot function overrides qdev->realize of the parent PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was created in hw/i386/pc_piix.c for the case when igd-passthru=on. Move the call to xen_host_pci_device_get, and the associated error handling, from xen_pt_realize to the new xen_igd_clear_slot function to initialize the device class and vendor values which enables the checks for the Intel IGD to succeed. The verification that the host device is an Intel IGD to be passed through is done by checking the domain, bus, slot, and function values as well as by checking that gfx_passthru is enabled, the device class is VGA, and the device vendor in Intel. Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Message-Id: <b1b4a21fe9a600b1322742dda55a40e9961daa57.1674346505.git.brchuckz@aol.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2023-03-03Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Peter Maydell
into staging virtio,pc,pci: features, cleanups, fixes vhost-user support without ioeventfd word replacements in vhost user spec shpc improvements cleanups, fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmQBO8QPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpMUMH/3/FVp4qaF4CDwCHn7xWFRJpOREIhX/iWfUu # lGkwxnB7Lfyqdg7i4CAfgMf2emWKZchEE2DamfCo5bIX0IgRU3DWcOdR9ePvJ29J # cKwIYpxZcB4RYSoWL5OUakQLCT3JOu4XWaXeVjyHABjQhf3lGpwN4KmIOBGOy/N6 # 0YHOQScW2eW62wIOwhAEuYQceMt6KU32Uw3tLnMbJliiBf3a/hPctVNM9TFY9pcd # UYHGfBx/zD45owf1lTVEQFDg0eqPZKWW29g5haiOd5oAyXHHolzu+bt3bU7lH46b # f7iP12LqDudyrgoF5YWv3NJ4HaGm5V3kPqNqLLF/mjF7alxG+N8= # =hN3h # -----END PGP SIGNATURE----- # gpg: Signature made Fri 03 Mar 2023 00:13:56 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (53 commits) tests/data/acpi/virt: drop (most) duplicate files. hw/cxl/mailbox: Use new UUID network order define for cel_uuid qemu/uuid: Add UUID static initializer qemu/bswap: Add const_le64() tests: acpi: Update q35/DSDT.cxl for removed duplicate UID hw/i386/acpi: Drop duplicate _UID entry for CXL root bridge tests/acpi: Allow update of q35/DSDT.cxl hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL hw/pci-bridge/cxl_downstream: Fix type naming mismatch hw/mem/cxl_type3: Improve error handling in realize() MAINTAINERS: Add Fan Ni as Compute eXpress Link QEMU reviewer intel-iommu: send UNMAP notifications for domain or global inv desc smmu: switch to use memory_region_unmap_iommu_notifier_range() memory: introduce memory_region_unmap_iommu_notifier_range() intel-iommu: fail DEVIOTLB_UNMAP without dt mode intel-iommu: fail MAP notifier without caching mode memory: Optimize replay of guest mapping chardev/char-socket: set s->listener = NULL in char_socket_finalize hw/pci: Trace IRQ routing on PCI topology ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-03-02hw/i386/acpi: Drop duplicate _UID entry for CXL root bridgeJonathan Cameron
Noticed as this prevents iASL disasembling the DSDT table. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Tested-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230206172816.8201-7-Jonathan.Cameron@huawei.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-02intel-iommu: send UNMAP notifications for domain or global inv descPeter Xu
We don't send UNMAP notification upon domain or global invalidation which will lead the notifier can't work correctly. One example is to use vhost remote IOTLB without enabling device IOTLB. Fixing this by sending UNMAP notification. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230223065924.42503-6-jasowang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-02intel-iommu: fail DEVIOTLB_UNMAP without dt modeJason Wang
Without dt mode, device IOTLB notifier won't work since guest won't send device IOTLB invalidation descriptor in this case. Let's fail early instead of misbehaving silently. Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Viktor Prutyanov <viktor@daynix.com> Buglink: https://bugzilla.redhat.com/2156876 Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230223065924.42503-3-jasowang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-02intel-iommu: fail MAP notifier without caching modeJason Wang
Without caching mode, MAP notifier won't work correctly since guest won't send IOTLB update event when it establishes new mappings in the I/O page tables. Let's fail the IOMMU notifiers early instead of misbehaving silently. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Viktor Prutyanov <viktor@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230223065924.42503-2-jasowang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-02memory: Optimize replay of guest mappingZhenzhong Duan
On x86, there are two notifiers registered due to vtd-ir memory region splitting the whole address space. During replay of the address space for each notifier, the whole address space is scanned which is unnecessory. We only need to scan the space belong to notifier montiored space. Assert when notifier is used to monitor beyond iommu memory region's address space. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20230215065238.713041-1-zhenzhong.duan@intel.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-02Revert "hw/i386: pass RNG seed via setup_data entry"Michael S. Tsirkin
This reverts commit 67f7e426e53833a5db75b0d813e8d537b8a75bd2. Additionally to the automatic revert, I went over the code and dropped all mentions of legacy_no_rng_seed manually, effectively reverting a combination of 2 additional commits: commit ffe2d2382e5f1aae1abc4081af407905ef380311 Author: Jason A. Donenfeld <Jason@zx2c4.com> Date: Wed Sep 21 11:31:34 2022 +0200 x86: re-enable rng seeding via SetupData commit 3824e25db1a84fadc50b88dfbe27047aa2f7f85d Author: Gerd Hoffmann <kraxel@redhat.com> Date: Wed Aug 17 10:39:40 2022 +0200 x86: disable rng seeding via setup_data Fixes: 67f7e426e5 ("hw/i386: pass RNG seed via setup_data entry") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Dov Murik <dovmurik@linux.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02Revert "x86: return modified setup_data only if read as memory, not as file"Michael S. Tsirkin
This reverts commit e935b735085dfa61d8e6d276b6f9e7687796a3c7. Fixes: e935b73508 ("x86: return modified setup_data only if read as memory, not as file") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Dov Murik <dovmurik@linux.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02Revert "x86: use typedef for SetupData struct"Michael S. Tsirkin
This reverts commit eebb38a5633a77f5fa79d6486d5b2fcf8fbe3c07. Fixes: eebb38a563 ("x86: use typedef for SetupData struct") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Dov Murik <dovmurik@linux.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02Revert "x86: reinitialize RNG seed on system reboot"Michael S. Tsirkin
This reverts commit 763a2828bf313ed55878b09759dc435355035f2e. Fixes: 763a2828bf ("x86: reinitialize RNG seed on system reboot") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Dov Murik <dovmurik@linux.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02Revert "x86: re-initialize RNG seed when selecting kernel"Michael S. Tsirkin
This reverts commit cc63374a5a7c240b7d3be734ef589dabbefc7527. Fixes: cc63374a5a ("x86: re-initialize RNG seed when selecting kernel") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Dov Murik <dovmurik@linux.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02Revert "x86: do not re-randomize RNG seed on snapshot load"Michael S. Tsirkin
This reverts commit 14b29fea742034186403914b4d013d0e83f19e78. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Fixes: 14b29fea74 ("x86: do not re-randomize RNG seed on snapshot load") Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Dov Murik <dovmurik@linux.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-02Revert "x86: don't let decompressed kernel image clobber setup_data"Michael S. Tsirkin
This reverts commit eac7a7791bb6d719233deed750034042318ffd56. Fixes: eac7a7791b ("x86: don't let decompressed kernel image clobber setup_data") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Dov Murik <dovmurik@linux.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2023-03-01hw/xen: Subsume xen_be_register_common() into xen_be_init()David Woodhouse
Every caller of xen_be_init() checks and exits on error, then calls xen_be_register_common(). Just make xen_be_init() abort for itself and return void, and register the common devices too. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01kvm/i386: Add xen-evtchn-max-pirq propertyDavid Woodhouse
The default number of PIRQs is set to 256 to avoid issues with 32-bit MSI devices. Allow it to be increased if the user desires. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Support MSI mapping to PIRQDavid Woodhouse
The way that Xen handles MSI PIRQs is kind of awful. There is a special MSI message which targets a PIRQ. The vector in the low bits of data must be zero. The low 8 bits of the PIRQ# are in the destination ID field, the extended destination ID field is unused, and instead the high bits of the PIRQ# are in the high 32 bits of the address. Using the high bits of the address means that we can't intercept and translate these messages in kvm_send_msi(), because they won't be caught by the APIC — addresses like 0x1000fee46000 aren't in the APIC's range. So we catch them in pci_msi_trigger() instead, and deliver the event channel directly. That isn't even the worst part. The worst part is that Xen snoops on writes to devices' MSI vectors while they are *masked*. When a MSI message is written which looks like it targets a PIRQ, it remembers the device and vector for later. When the guest makes a hypercall to bind that PIRQ# (snooped from a marked MSI vector) to an event channel port, Xen *unmasks* that MSI vector on the device. Xen guests using PIRQ delivery of MSI don't ever actually unmask the MSI for themselves. Now that this is working we can finally enable XENFEAT_hvm_pirqs and let the guest use it all. Tested with passthrough igb and emulated e1000e + AHCI. CPU0 CPU1 0: 65 0 IO-APIC 2-edge timer 1: 0 14 xen-pirq 1-ioapic-edge i8042 4: 0 846 xen-pirq 4-ioapic-edge ttyS0 8: 1 0 xen-pirq 8-ioapic-edge rtc0 9: 0 0 xen-pirq 9-ioapic-level acpi 12: 257 0 xen-pirq 12-ioapic-edge i8042 24: 9600 0 xen-percpu -virq timer0 25: 2758 0 xen-percpu -ipi resched0 26: 0 0 xen-percpu -ipi callfunc0 27: 0 0 xen-percpu -virq debug0 28: 1526 0 xen-percpu -ipi callfuncsingle0 29: 0 0 xen-percpu -ipi spinlock0 30: 0 8608 xen-percpu -virq timer1 31: 0 874 xen-percpu -ipi resched1 32: 0 0 xen-percpu -ipi callfunc1 33: 0 0 xen-percpu -virq debug1 34: 0 1617 xen-percpu -ipi callfuncsingle1 35: 0 0 xen-percpu -ipi spinlock1 36: 8 0 xen-dyn -event xenbus 37: 0 6046 xen-pirq -msi ahci[0000:00:03.0] 38: 1 0 xen-pirq -msi-x ens4 39: 0 73 xen-pirq -msi-x ens4-rx-0 40: 14 0 xen-pirq -msi-x ens4-rx-1 41: 0 32 xen-pirq -msi-x ens4-tx-0 42: 47 0 xen-pirq -msi-x ens4-tx-1 Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Support GSI mapping to PIRQDavid Woodhouse
If I advertise XENFEAT_hvm_pirqs then a guest now boots successfully as long as I tell it 'pci=nomsi'. [root@localhost ~]# cat /proc/interrupts CPU0 0: 52 IO-APIC 2-edge timer 1: 16 xen-pirq 1-ioapic-edge i8042 4: 1534 xen-pirq 4-ioapic-edge ttyS0 8: 1 xen-pirq 8-ioapic-edge rtc0 9: 0 xen-pirq 9-ioapic-level acpi 11: 5648 xen-pirq 11-ioapic-level ahci[0000:00:04.0] 12: 257 xen-pirq 12-ioapic-edge i8042 ... Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Implement emulated PIRQ hypercall supportDavid Woodhouse
This wires up the basic infrastructure but the actual interrupts aren't there yet, so don't advertise it to the guest. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01i386/xen: Implement HYPERVISOR_physdev_opDavid Woodhouse
Just hook up the basic hypercalls to stubs in xen_evtchn.c for now. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Automatically add xen-platform PCI device for emulated Xen guestsDavid Woodhouse
It isn't strictly mandatory but Linux guests at least will only map their grant tables over the dummy BAR that it provides, and don't have sufficient wit to map them in any other unused part of their guest address space. So include it by default for minimal surprise factor. As I come to document "how to run a Xen guest in QEMU", this means one fewer thing to tell the user about, according to the mantra of "if it needs documenting, fix it first, then document what remains". Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Add basic ring handling to xenstoreDavid Woodhouse
Extract requests, return ENOSYS to all of them. This is enough to allow older Linux guests to boot, as they need *something* back but it doesn't matter much what. A full implementation of a single-tentant internal XenStore copy-on-write tree with transactions and watches is waiting in the wings to be sent in a subsequent round of patches along with hooking up the actual PV disk back end in qemu, but this is enough to get guests booting for now. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Add xen_xenstore device for xenstore emulationDavid Woodhouse
Just the basic shell, with the event channel hookup. It only dumps the buffer for now; a real ring implmentation will come in a subsequent patch. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Add backend implementation of interdomain event channel supportDavid Woodhouse
The provides the QEMU side of interdomain event channels, allowing events to be sent to/from the guest. The API mirrors libxenevtchn, and in time both this and the real Xen one will be available through ops structures so that the PV backend drivers can use the correct one as appropriate. For now, this implementation can be used directly by our XenStore which will be for emulated mode only. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01i386/xen: handle PV timer hypercallsJoao Martins
Introduce support for one shot and periodic mode of Xen PV timers, whereby timer interrupts come through a special virq event channel with deadlines being set through: 1) set_timer_op hypercall (only oneshot) 2) vcpu_op hypercall for {set,stop}_{singleshot,periodic}_timer hypercalls Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Implement GNTTABOP_query_sizeDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01i386/xen: Implement HYPERVISOR_grant_table_op and GNTTABOP_[gs]et_versonDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Support mapping grant framesDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Add xen_gnttab device for grant table emulationDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Support HVM_PARAM_CALLBACK_TYPE_PCI_INTX callbackDavid Woodhouse
The guest is permitted to specify an arbitrary domain/bus/device/function and INTX pin from which the callback IRQ shall appear to have come. In QEMU we can only easily do this for devices that actually exist, and even that requires us "knowing" that it's a PCMachine in order to find the PCI root bus — although that's OK really because it's always true. We also don't get to get notified of INTX routing changes, because we can't do that as a passive observer; if we try to register a notifier it will overwrite any existing notifier callback on the device. But in practice, guests using PCI_INTX will only ever use pin A on the Xen platform device, and won't swizzle the INTX routing after they set it up. So this is just fine. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callbackDavid Woodhouse
The GSI callback (and later PCI_INTX) is a level triggered interrupt. It is asserted when an event channel is delivered to vCPU0, and is supposed to be cleared when the vcpu_info->evtchn_upcall_pending field for vCPU0 is cleared again. Thankfully, Xen does *not* assert the GSI if the guest sets its own evtchn_upcall_pending field; we only need to assert the GSI when we have delivered an event for ourselves. So that's the easy part, kind of. There's a slight complexity in that we need to hold the BQL before we can call qemu_set_irq(), and we definitely can't do that while holding our own port_lock (because we'll need to take that from the qemu-side functions that the PV backend drivers will call). So if we end up wanting to set the IRQ in a context where we *don't* already hold the BQL, defer to a BH. However, we *do* need to poll for the evtchn_upcall_pending flag being cleared. In an ideal world we would poll that when the EOI happens on the PIC/IOAPIC. That's how it works in the kernel with the VFIO eventfd pairs — one is used to trigger the interrupt, and the other works in the other direction to 'resample' on EOI, and trigger the first eventfd again if the line is still active. However, QEMU doesn't seem to do that. Even VFIO level interrupts seem to be supported by temporarily unmapping the device's BARs from the guest when an interrupt happens, then trapping *all* MMIO to the device and sending the 'resample' event on *every* MMIO access until the IRQ is cleared! Maybe in future we'll plumb the 'resample' concept through QEMU's irq framework but for now we'll do what Xen itself does: just check the flag on every vmexit if the upcall GSI is known to be asserted. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01i386/xen: add monitor commands to test event injectionJoao Martins
Specifically add listing, injection of event channels. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Implement EVTCHNOP_resetDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Implement EVTCHNOP_bind_vcpuDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Implement EVTCHNOP_bind_interdomainDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Implement EVTCHNOP_alloc_unboundDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Implement EVTCHNOP_sendDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Implement EVTCHNOP_bind_ipiDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Implement EVTCHNOP_bind_virqDavid Woodhouse
Add the array of virq ports to each vCPU so that we can deliver timers, debug ports, etc. Global virqs are allocated against vCPU 0 initially, but can be migrated to other vCPUs (when we implement that). The kernel needs to know about VIRQ_TIMER in order to accelerate timers, so tell it via KVM_XEN_VCPU_ATTR_TYPE_TIMER. Also save/restore the value of the singleshot timer across migration, as the kernel will handle the hypercalls automatically now. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Implement EVTCHNOP_unmaskDavid Woodhouse
This finally comes with a mechanism for actually injecting events into the guest vCPU, with all the atomic-test-and-set that's involved in setting the bit in the shinfo, then the index in the vcpu_info, and injecting either the lapic vector as MSI, or letting KVM inject the bare vector. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Implement EVTCHNOP_closeDavid Woodhouse
It calls an internal close_port() helper which will also be used from EVTCHNOP_reset and will actually do the work to disconnect/unbind a port once any of that is actually implemented in the first place. That in turn calls a free_port() internal function which will be in error paths after allocation. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Implement EVTCHNOP_statusDavid Woodhouse
This adds the basic structure for maintaining the port table and reporting the status of ports therein. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Add xen_evtchn device for event channel emulationDavid Woodhouse
Include basic support for setting HVM_PARAM_CALLBACK_IRQ to the global vector method HVM_PARAM_CALLBACK_TYPE_VECTOR, which is handled in-kernel by raising the vector whenever the vCPU's vcpu_info->evtchn_upcall_pending flag is set. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01i386/xen: manage and save/restore Xen guest long_mode settingDavid Woodhouse
Xen will "latch" the guest's 32-bit or 64-bit ("long mode") setting when the guest writes the MSR to fill in the hypercall page, or when the guest sets the event channel callback in HVM_PARAM_CALLBACK_IRQ. KVM handles the former and sets the kernel's long_mode flag accordingly. The latter will be handled in userspace. Keep them in sync by noticing when a hypercall is made in a mode that doesn't match qemu's idea of the guest mode, and resyncing from the kernel. Do that same sync right before serialization too, in case the guest has set the hypercall page but hasn't yet made a system call. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01i386/xen: add pc_machine_kvm_type to initialize XEN_EMULATE modeDavid Woodhouse
The xen_overlay device (and later similar devices for event channels and grant tables) need to be instantiated. Do this from a kvm_type method on the PC machine derivatives, since KVM is only way to support Xen emulation for now. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01hw/xen: Add xen_overlay device for emulating shared xenheap pagesDavid Woodhouse
For the shared info page and for grant tables, Xen shares its own pages from the "Xen heap" to the guest. The guest requests that a given page from a certain address space (XENMAPSPACE_shared_info, etc.) be mapped to a given GPA using the XENMEM_add_to_physmap hypercall. To support that in qemu when *emulating* Xen, create a memory region (migratable) and allow it to be mapped as an overlay when requested. Xen theoretically allows the same page to be mapped multiple times into the guest, but that's hard to track and reinstate over migration, so we automatically *unmap* any previous mapping when creating a new one. This approach has been used in production with.... a non-trivial number of guests expecting true Xen, without any problems yet being noticed. This adds just the shared info page for now. The grant tables will be a larger region, and will need to be overlaid one page at a time. I think that means I need to create separate aliases for each page of the overall grant_frames region, so that they can be mapped individually. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01xen-platform: allow its creation with XEN_EMULATE modeJoao Martins
The only thing we need to fix to make this build is the PIO hack which sets the BIOS memory areas to R/W v.s. R/O. Theoretically we could hook that up to the PAM registers on the emulated PIIX, but in practice nobody cares, so just leave it doing nothing. Now it builds without actual Xen, move it to CONFIG_XEN_BUS to include it in the KVM-only builds. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01xen-platform: exclude vfio-pci from the PCI platform unplugJoao Martins
Such that PCI passthrough devices work for Xen emulated guests. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01xen: add CONFIG_XEN_BUS and CONFIG_XEN_EMU options for Xen emulationDavid Woodhouse
The XEN_EMU option will cover core Xen support in target/, which exists only for x86 with KVM today but could theoretically also be implemented on Arm/Aarch64 and with TCG or other accelerators (if anyone wants to run the gauntlet of struct layout compatibility, errno mapping, and the rest of that fui). It will also cover the support for architecture-independent grant table and event channel support which will be added in hw/i386/kvm/ (on the basis that the non-KVM support is very theoretical and making it not use KVM directly seems like gratuitous overengineering at this point). The XEN_BUS option is for the xenfv platform support, which will now be used both by XEN_EMU and by real Xen. The XEN option remains dependent on the Xen runtime libraries, and covers support for real Xen. Some code which currently resides under CONFIG_XEN will be moving to CONFIG_XEN_BUS over time as the direct dependencies on Xen runtime libraries are eliminated. The Xen PCI platform device will also reside under CONFIG_XEN_BUS. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-02-27hw/ide: Declare ide_get_[geometry/bios_chs_trans] in 'hw/ide/internal.h'Philippe Mathieu-Daudé
ide_get_geometry() and ide_get_bios_chs_trans() are only used by the TYPE_PC_MACHINE. "hw/ide.h" is a mixed bag of lost IDE declarations. In order to remove this (almost) pointless header soon, move these declarations to "hw/ide/internal.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20230220091358.17038-18-philmd@linaro.org>