aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)Author
2023-03-07vfio/migration: Rename entry pointsAlex Williamson
Pick names that align with the section drivers should use them from, avoiding the confusion of calling a _finalize() function from _exit() and generalizing the actual _finalize() to handle removing the viommu blocker. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/167820912978.606734.12740287349119694623.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-03-07hw/mem/cxl_type3: Add CXL RAS Error Injection Support.Jonathan Cameron
CXL uses PCI AER Internal errors to signal to the host that an error has occurred. The host can then read more detailed status from the CXL RAS capability. For uncorrectable errors: support multiple injection in one operation as this is needed to reliably test multiple header logging support in an OS. The equivalent feature doesn't exist for correctable errors, so only one error need be injected at a time. Note: - Header content needs to be manually specified in a fashion that matches the specification for what can be in the header for each error type. Injection via QMP: { "execute": "qmp_capabilities" } ... { "execute": "cxl-inject-uncorrectable-errors", "arguments": { "path": "/machine/peripheral/cxl-pmem0", "errors": [ { "type": "cache-address-parity", "header": [ 3, 4] }, { "type": "cache-data-parity", "header": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31] }, { "type": "internal", "header": [ 1, 2, 4] } ] }} ... { "execute": "cxl-inject-correctable-error", "arguments": { "path": "/machine/peripheral/cxl-pmem0", "type": "physical" } } Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230302133709.30373-9-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07hw/pci/aer: Make PCIE AER error injection facility available for other ↵Jonathan Cameron
emulation to use. This infrastructure will be reused for CXL RAS error injection in patches that follow. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230302133709.30373-8-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Fan Ni <fan.ni@samsung.com>
2023-03-07hw/cxl: Fix endian issues in CXL RAS capability defaults / masksJonathan Cameron
As these are about to be modified, fix the endian handle for this set of registers rather than making it worse. Note that CXL is currently only supported in QEMU on x86 (arm64 patches out of tree) so we aren't going to yet hit an problems with big endian. However it is good to avoid making things worse for that support in the future. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230302133709.30373-7-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Fan Ni <fan.ni@samsung.com>
2023-03-07hw/mem/cxl-type3: Add AER extended capabilityJonathan Cameron
This enables AER error injection to function as expected. It is intended as a building block in enabling CXL RAS error injection in the following patches. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Message-Id: <20230302133709.30373-6-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Fan Ni <fan.ni@samsung.com>
2023-03-07hw/pci-bridge/cxl_root_port: Wire up MSIJonathan Cameron
Done to avoid fixing ACPI route description of traditional PCI interrupts on q35 and because we should probably move with the times anyway. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Message-Id: <20230302133709.30373-5-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Fan Ni <fan.ni@samsung.com>
2023-03-07hw/pci-bridge/cxl_root_port: Wire up AERJonathan Cameron
We are missing necessary config write handling for AER emulation in the CXL root port. Add it based on pcie_root_port.c Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Message-Id: <20230302133709.30373-4-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Fan Ni <fan.ni@samsung.com>
2023-03-07hw/pci/aer: Add missing routing for AER errorsJonathan Cameron
PCIe r6.0 Figure 6-3 "Pseudo Logic Diagram for Selected Error Message Control and Status Bits" includes a right hand branch under "All PCI Express devices" that allows for messages to be generated or sent onwards without SERR# being set as long as the appropriate per error class bit in the PCIe Device Control Register is set. Implement that branch thus enabling routing of ERR_COR, ERR_NONFATAL and ERR_FATAL under OSes that set these bits appropriately (e.g. Linux) Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Message-Id: <20230302133709.30373-3-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Fan Ni <fan.ni@samsung.com>
2023-03-07hw/pci/aer: Implement PCI_ERR_UNCOR_MASK registerJonathan Cameron
This register in AER should be both writeable and should have a default value with a couple of the errors masked including the Uncorrectable Internal Error used by CXL for it's error reporting. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Message-Id: <20230302133709.30373-2-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Fan Ni <fan.ni@samsung.com>
2023-03-07pcihp: add ACPI PCI hotplug specific is_hotpluggable_bus() callbackIgor Mammedov
Provide pcihp specific callback to check if bus is hotpluggable and consolidate its scattered hotplug criteria there. While at it clean up no longer needed qbus_set_hotplug_handler(BUS(bus), NULL) workarounds since callback makes qbus_is_hotpluggable() return correct answer even if hotplug_handler is set on bus. PS: see ("pci: fix 'hotplugglable' property behavior") for details why callback was introduced. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-35-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07pcihp: move fields enabling hotplug into AcpiPciHpStateIgor Mammedov
... instead of duplicating them in piix4 and lpc and then trying to pass them to pcihp routines as arguments. it simplifies call sites and places pcihp specific in its own structure. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-34-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07acpi: pci: move out ACPI PCI hotplug generator from generic slot generator ↵Igor Mammedov
build_append_pci_bus_devices() Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-33-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07acpi: pci: move BSEL into build_append_pcihp_slots()Igor Mammedov
Generic PCI enumeration code doesn't really need access to BSEL value, it is only used as means to decide if hotplug enumerator should be called. Use stateless object_property_find() to do that, and move the rest of BSEL handling into build_append_pcihp_slots() where it belongs. This cleans up generic code a bit from hotplug stuff and follow up patch will remove remaining call to build_append_pcihp_slots() from generic code, making it possible to use without ACPI PCI hotplug dependencies. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-32-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07acpi: pci: drop BSEL usage when deciding that device isn't hotpluggableIgor Mammedov
previous commit ("pci: fix 'hotplugglable' property behavior") fixed pcie root port's 'hotpluggable' property to behave consistently. So we don't need a BSEL crutch anymore to see of device is not hotpluggable, drop it from 'generic' PCI slots description handling. BSEL is still used to decide if hotplug part should be called but that will be moved out of generic code to hotplug one by followup patches. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-31-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07pci: move acpi-index uniqueness check to generic PCI device codeIgor Mammedov
acpi-index is now working with non-hotpluggable buses (pci/q35 machine hostbridge), it can be used even if ACPI PCI hotplug is disabled and as result acpi-index uniqueness check will be omitted (since the check is done by ACPI PCI hotplug handler, which isn't wired when ACPI PCI hotplug is disabled). Move check and related code to generic PCIDevice so it would be independent of ACPI PCI hotplug. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-30-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07acpi: pci: describe all functions on populated slotsIgor Mammedov
describing all present devices on functions other than 0 was complicated when non hotplug and hotplug code was intermixed. So QEMU has been excluding non zero functions since they are not supported by hotplug code, then a condition to whitelist coldplugged bridges was added and later whitelisting of devices that advertise presence of their own AML description. With non hotplug and hotplug code separated, it is possible to relax rules and allow describing all non-hotpluggble functions and hence simplify conditions whether PCI device should be enumerated by generic (non-hotplug) code. Price of that simplification is an extra few Device() descriptors in DSDT exposing built-in chipset functions, which has no functional effect on guest side. Apart from that, the enumeration of non zero functions, allows to attach more NICs with acpi-index enabled directly on hostbridge (if hotplug is not required). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-25-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07acpi: pci: support acpi-index for non-hotpluggable devicesIgor Mammedov
Inject static _DSM (EDSM) if non-hotpluggable device has acpi-index configured on it. It lets use acpi-index non-hotpluggable devices / devices attached to non-hotpluggable bus. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-22-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07acpi: pci: add EDSM method to DSDTIgor Mammedov
it's a helper method for acpi-index support on PCI buses that do no support or have disabled ACPI PCI hotplug or for non-hotpluggble endpoint devices. (like non-hotpluggble NICs, integrated endpoints and later for machines that do not support ACPI PCI hotplug) no functional change, commit adds only EDSM method in DSDT without any users. (the follow up patches will use it) Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-18-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07pcihp: move PCI _DSM function 0 prolog into separate functionIgor Mammedov
it will be reused by follow up patches that will implement static _DSM for non-hotpluggable devices. no functional AML change, only context one, where 'cap' (Local1) initialization is moved after UUID/revision checks. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-15-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07pci: fix 'hotplugglable' property behaviorIgor Mammedov
Currently the property may flip its state during VM bring up or just doesn't work as the name implies. In particular with PCIE root port that has 'hotplug={on|off}' property, and when it's turned off, one would expect 'hotpluggable' == false for any devices attached to it. Which is not the case since qbus_is_hotpluggable() used by the property just checks for presence of any hotplug_handler set on bus. The problem is that name BusState::hotplug_handler from its inception is misnomer, as it handles not only hotplug but also in many cases coldplug as well (i.e. generic wiring interface), and it's fine to have hotplug_handler set on bus while it doesn't support hotplug (ex. pcie-slot with hotplug=off). Another case of root port flipping 'hotpluggable' state when ACPI PCI hotplug is enabled in this case root port with 'hotplug=off' starts as hotpluggable and then later on, pcihp hotplug_handler clears hotplug_handler explicitly after checking root port's 'hotplug' property. So root-port hotpluggablity check sort of works if pcihp is enabled but is broken if pcihp is disabled. One way to deal with the issue is to ask hotplug_handler if bus it controls is hotpluggable or not. To do that add is_hotpluggable_bus() hook to HotplugHandler interface and use it in 'hotpluggable' property + teach pcie-slot to actually look into 'hotplug' property state before deciding if bus is hotpluggable. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-13-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07pcihp: piix4: do not redirect hotplug controller to piix4 when ACPI hotplug ↵Igor Mammedov
is disabled commit [1] added ability to disable ACPI PCI hotplug on hostbridge but forgot to take into account that it should disable all ACPI hotplug machinery in case both hostbridge and bridge hotplug are disabled. Commit [2] tried to fix that, however it forgot to remove hotplug_handler override which hands hotplug control over to piix4 hotplug controller (uninitialized after [2]). As result at the time bridge is plugged in, its default (SHPC) hotplug handler is replaced by piix4 one in acpi_pcihp_device_plug_cb() ... if (!s->legacy_piix && ... qbus_set_hotplug_handler(BUS(sec), OBJECT(hotplug_dev)); which is acting on uninitialized s->legacy_piix value (0 by default) that was supposed to be initialized by acpi_pcihp_init(), that is no longer called due to following condition being false: piix4_acpi_system_hot_add_init() if (s->use_acpi_hotplug_bridge || s->use_acpi_root_pci_hotplug) { and the bridge ends up with piix4 as hotplug handler instead of shpc one. Followup hotplug on that bridge as result yields piix4 specific error: Error: Unsupported bus. Bus doesn't have property 'acpi-pcihp-bsel' set 1) 3d7e78aa777 (Introduce a new flag for i440fx to disable PCI hotplug on the root bus) 2) df4008c9c59 (piix4: don't reserve hw resources when hotplug is off globally) Fixes: df4008c9c59 (piix4: don't reserve hw resources when hotplug is off globally) Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-12-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07x86: pcihp: fix missing bridge AML when intermediate root-port has ↵Igor Mammedov
'hotplug=off' set (I practice [1] hasn't broke anything since on hardware side we unset hotplug_handler on such intermediate port => hotplug behind it has never worked) When deciding if bridge should be described, the original condition was cold_plugged_bridge && pcihp_bridge_en which was replaced [1] by bridge has ACPI_PCIHP_PROP_BSEL the later however is not the same thing as the original and flips to false if intermediate bridge has hotplug turned off (root-port with 'hotplug=off' option). Since we already in build_pci_bridge_aml(), the question if it's bridge is answered. Use DeviceState::hotplugged to make decision if bridge should describe its slots. What's left out is pcihp_bridge_en, which tells us if ACPI bridge hotplug is enabled. With hotplug and non hotplug part now being mostly separated, omitting this check will only lead to colplugged bridges describe occupied slots in case when ACPI bridge hotplug is disabled. Which makes behavior consistent with occupied slots on hostbridge. Ex (pc/DSDT.hpbrroot diff): ... Device (S20) { Name (_ADR, 0x00040000) // _ADR: Address + Device (S08) + { + Name (_ADR, 0x00010000) // _ADR: Address + } + + Device (S10) + { + Name (_ADR, 0x00020000) // _ADR: Address + } } ... PS: testing shows that above doesn't affect adversely guest OS behavior: i.e. if ACPI bridge hotplug is enabled it's expected behaviour, and with ACPI bridge hotplug is disabled (a.k. native hotplug), it doesn't break slot enumeration nor native hotplug. (tested with RHEL9.0 and WS2022). 1) Fixes: 6c36ec46b0d ("pcihp: make bridge describe itself using AcpiDevAmlIfClass:build_dev_aml") Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-10-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07x86: pcihp: fix missing PCNT callchain when intermediate root-port has ↵Igor Mammedov
'hotplug=off' set Beside BSEL numbers change (due to 2 extra root-ports in q35/miltibridge test), following change is expected: Scope (\_SB.PCI0) { ... + Scope (S50) + { + Scope (S00) + { + Method (PCNT, 0, NotSerialized) + { + BNUM = Zero + DVNT (PCIU, One) + DVNT (PCID, 0x03) + } + } + + Method (PCNT, 0, NotSerialized) + { + ^S00.PCNT + } + } ... Method (PCNT, 0, NotSerialized) { + ^S50.PCNT () ^S13.PCNT () ^S12.PCNT () ^S11.PCNT () I practice [1] hasn't broke anything since on hardware side we unset hotplug_handler on such intermediate port => hotplug behind it has not been properly wired and as result not worked. 1) Fixes: ddab4d3fae4e8 ("pcihp: compose PCNT callchain right before its user _GPE._E01") Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20230302161543.286002-8-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devicesEugenio Pérez
vhost-vdpa devices can return this feature now that blockers have been set in case some features are not met. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230303172445.1089785-15-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07vdpa: block migration if SVQ does not admit a featureEugenio Pérez
Next patches enable devices to be migrated even if vdpa netdev has not been started with x-svq. However, not all devices are migratable, so we need to block migration if we detect that. Block migration if we detect the device expose a feature SVQ does not know how to work with. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230303172445.1089785-13-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07vdpa net: block migration if the device has CVQEugenio Pérez
Devices with CVQ need to migrate state beyond vq state. Leaving this to future series. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230303172445.1089785-11-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07vdpa: disable RAM block discard only for the first deviceEugenio Pérez
Although it does not make a big difference, its more correct and simplifies the cleanup path in subsequent patches. Move ram_block_discard_disable(false) call to the top of vhost_vdpa_cleanup because: * We cannot use vhost_vdpa_first_dev after dev->opaque = NULL assignment. * Improve the stack order in cleanup: since it is the last action taken in init, it should be the first at cleanup. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230303172445.1089785-10-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07vdpa: move vhost reset after get vring baseEugenio Pérez
The function vhost.c:vhost_dev_stop calls vhost operation vhost_dev_start(false). In the case of vdpa it totally reset and wipes the device, making the fetching of the vring base (virtqueue state) totally useless. The kernel backend does not use vhost_dev_start vhost op callback, but vhost-user do. A patch to make vhost_user_dev_start more similar to vdpa is desirable, but it can be added on top. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230303172445.1089785-8-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07vdpa: add vhost_vdpa_suspendEugenio Pérez
The function vhost.c:vhost_dev_stop fetches the vring base so the vq state can be migrated to other devices. However, this is unreliable in vdpa, since we didn't signal the device to suspend the queues, making the value fetched useless. Suspend the device if possible before fetching first and subsequent vring bases. Moreover, vdpa totally reset and wipes the device at the last device before fetch its vrings base, making that operation useless in the last device. This will be fixed in later patches of this series. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230303172445.1089785-7-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07vdpa: add vhost_vdpa->suspended parameterEugenio Pérez
This allows vhost_vdpa to track if it is safe to get the vring base from the device or not. If it is not, vhost can fall back to fetch idx from the guest buffer again. No functional change intended in this patch, later patches will use this field. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230303172445.1089785-6-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07vdpa: rewind at get_base, not set_baseEugenio Pérez
At this moment it is only possible to migrate to a vdpa device running with x-svq=on. As a protective measure, the rewind of the inflight descriptors was done at the destination. That way if the source sent a virtqueue with inuse descriptors they are always discarded. Since this series allows to migrate also to passthrough devices with no SVQ, the right thing to do is to rewind at the source so the base of vrings are correct. Support for inflight descriptors may be added in the future. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230303172445.1089785-5-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07vdpa: Negotiate _F_SUSPEND featureEugenio Pérez
This is needed for qemu to know it can suspend the device to retrieve its status and enable SVQ with it, so all the process is transparent to the guest. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230303172445.1089785-4-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07vdpa: Remember last call fd setEugenio Pérez
As SVQ can be enabled dynamically at any time, it needs to store call fd always. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230303172445.1089785-3-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07cryptodev: Use CryptoDevBackendOpInfo for operationzhenwei pi
Move queue_index, CryptoDevCompletionFunc and opaque into struct CryptoDevBackendOpInfo, then cryptodev_backend_crypto_operation() needs an argument CryptoDevBackendOpInfo *op_info only. And remove VirtIOCryptoReq from cryptodev. It's also possible to hide VirtIOCryptoReq into virtio-crypto.c in the next step. (In theory, VirtIOCryptoReq is a private structure used by virtio-crypto only) Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Message-Id: <20230301105847.253084-9-pizhenwei@bytedance.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07cryptodev: Introduce server type in QAPIzhenwei pi
Introduce cryptodev service type in cryptodev.json, then apply this to related codes. Now we can remove VIRTIO_CRYPTO_SERVICE_xxx dependence from QEMU cryptodev. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Message-Id: <20230301105847.253084-5-pizhenwei@bytedance.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07cryptodev: Introduce cryptodev alg type in QAPIzhenwei pi
Introduce cryptodev alg type in cryptodev.json, then apply this to related codes, and drop 'enum CryptoDevBackendAlgType'. There are two options: 1, { 'enum': 'QCryptodevBackendAlgType', 'prefix': 'CRYPTODEV_BACKEND_ALG', 'data': ['sym', 'asym']} Then we can keep 'CRYPTODEV_BACKEND_ALG_SYM' and avoid lots of changes. 2, changes in this patch(with prefix 'QCRYPTODEV_BACKEND_ALG'). To avoid breaking the rule of QAPI, use 2 here. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Message-Id: <20230301105847.253084-4-pizhenwei@bytedance.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-07vfio/migration: Query device dirty page tracking supportJoao Martins
Now that everything has been set up for device dirty page tracking, query the device for device dirty page tracking support. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-15-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-03-07vfio/migration: Block migration with vIOMMUJoao Martins
Migrating with vIOMMU will require either tracking maximum IOMMU supported address space (e.g. 39/48 address width on Intel) or range-track current mappings and dirty track the new ones post starting dirty tracking. This will be done as a separate series, so add a live migration blocker until that is fixed. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-14-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-03-07vfio/common: Add device dirty page bitmap syncJoao Martins
Add device dirty page bitmap sync functionality. This uses the device DMA logging uAPI to sync dirty page bitmap from the device. Device dirty page bitmap sync is used only if all devices within a container support device dirty page tracking. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-13-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-03-07vfio/common: Extract code from vfio_get_dirty_bitmap() to new functionAvihai Horon
Extract the VFIO_IOMMU_DIRTY_PAGES ioctl code in vfio_get_dirty_bitmap() to its own function. This will help the code to be more readable after next patch will add device dirty page bitmap sync functionality. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-12-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-03-07vfio/common: Add device dirty page tracking start/stopJoao Martins
Add device dirty page tracking start/stop functionality. This uses the device DMA logging uAPI to start and stop dirty page tracking by device. Device dirty page tracking is used only if all devices within a container support device dirty page tracking. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20230307125450.62409-11-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-03-07includes: move tb_flush into its own headerAlex Bennée
This aids subsystems (like gdbstub) that want to trigger a flush without pulling target specific headers. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20230302190846.2593720-8-alex.bennee@linaro.org> Message-Id: <20230303025805.625589-8-richard.henderson@linaro.org>
2023-03-07i386/xen: Initialize Xen backends from pc_basic_device_init() for emulationDavid Woodhouse
Now that all the work is done to enable the PV backends to work without actual Xen, instantiate the bus from pc_basic_device_init() for emulated mode. This allows us finally to launch an emulated Xen guest with PV disk. qemu-system-x86_64 -serial mon:stdio -M q35 -cpu host -display none \ -m 1G -smp 2 -accel kvm,xen-version=0x4000a,kernel-irqchip=split \ -kernel bzImage -append "console=ttyS0 root=/dev/xvda1" \ -drive file=/var/lib/libvirt/images/fedora28.qcow2,if=none,id=disk \ -device xen-disk,drive=disk,vdev=xvda If we use -M pc instead of q35, we can even add an IDE disk and boot a guest image normally through grub. But q35 gives us AHCI and that isn't unplugged by the Xen magic, so the guests ends up seeing "both" disks. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-07hw/xen: Implement soft reset for emulated gnttabDavid Woodhouse
This is only part of it; we will also need to get the PV back end drivers to tear down their own mappings (or do it for them, but they kind of need to stop using the pointers too). Some more work on the actual PV back ends and xen-bus code is going to be needed to really make soft reset and migration fully functional, and this part is the basis for that. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-07hw/xen: Map guest XENSTORE_PFN grant in emulated XenstoreDavid Woodhouse
We don't actually access the guest's page through the grant, because this isn't real Xen, and we can just use the page we gave it in the first place. Map the grant anyway, mostly for cosmetic purposes so it *looks* like it's in use in the guest-visible grant table. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-07hw/xen: Add emulated implementation of XenStore operationsDavid Woodhouse
Now that we have an internal implementation of XenStore, we can populate the xenstore_backend_ops to allow PV backends to talk to it. Watches can't be processed with immediate callbacks because that would call back into XenBus code recursively. Defer them to a QEMUBH to be run as appropriate from the main loop. We use a QEMUBH per XS handle, and it walks all the watches (there shouldn't be many per handle) to fire any which have pending events. We *could* have done it differently but this allows us to use the same struct watch_event as we have for the guest side, and keeps things relatively simple. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-07hw/xen: Add emulated implementation of grant table operationsDavid Woodhouse
This is limited to mapping a single grant at a time, because under Xen the pages are mapped *contiguously* into qemu's address space, and that's very hard to do when those pages actually come from anonymous mappings in qemu in the first place. Eventually perhaps we can look at using shared mappings of actual objects for system RAM, and then we can make new mappings of the same backing store (be it deleted files, shmem, whatever). But for now let's stick to a page at a time. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-07hw/xen: Hook up emulated implementation for event channel operationsDavid Woodhouse
We provided the backend-facing evtchn functions very early on as part of the core Xen platform support, since things like timers and xenstore need to use them. By what may or may not be an astonishing coincidence, those functions just *happen* all to have exactly the right function prototypes to slot into the evtchn_backend_ops table and be called by the PV backends. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-07hw/xen: Only advertise ring-page-order for xen-block if gnttab supports itDavid Woodhouse
Whem emulating Xen, multi-page grants are distinctly non-trivial and we have elected not to support them for the time being. Don't advertise them to the guest. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-07hw/xen: Avoid crash when backend watch fires too earlyPaul Durrant
The xen-block code ends up calling aio_poll() through blkconf_geometry(), which means we see watch events during the indirect call to xendev_class->realize() in xen_device_realize(). Unfortunately this call is made before populating the initial frontend and backend device nodes in xenstore and hence xen_block_frontend_changed() (which is called from a watch event) fails to read the frontend's 'state' node, and hence believes the device is being torn down. This in-turn sets the backend state to XenbusStateClosed and causes the device to be deleted before it is fully set up, leading to the crash. By simply moving the call to xendev_class->realize() after the initial xenstore nodes are populated, this sorry state of affairs is avoided. Reported-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paul Durrant <pdurrant@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>