aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2022-05-13hw/cxl/device: Plumb real Label Storage Area (LSA) sizingBen Widawsky
This should introduce no change. Subsequent work will make use of this new class member. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20220429144110.25167-21-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-13hw/cxl/device: Add a memory device (8.2.8.5)Ben Widawsky
A CXL memory device (AKA Type 3) is a CXL component that contains some combination of volatile and persistent memory. It also implements the previously defined mailbox interface as well as the memory device firmware interface. Although the memory device is configured like a normal PCIe device, the memory traffic is on an entirely separate bus conceptually (using the same physical wires as PCIe, but different protocol). Once the CXL topology is fully configure and address decoders committed, the guest physical address for the memory device is part of a larger window which is owned by the platform. The creation of these windows is later in this series. The following example will create a 256M device in a 512M window: -object "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M" -device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0" Note: Dropped PCDIMM info interfaces for now. They can be added if appropriate at a later date. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20220429144110.25167-18-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-13hw/pxb: Allow creation of a CXL PXB (host bridge)Ben Widawsky
This works like adding a typical pxb device, except the name is 'pxb-cxl' instead of 'pxb-pcie'. An example command line would be as follows: -device pxb-cxl,id=cxl.0,bus="pcie.0",bus_nr=1 A CXL PXB is backward compatible with PCIe. What this means in practice is that an operating system that is unaware of CXL should still be able to enumerate this topology as if it were PCIe. One can create multiple CXL PXB host bridges, but a host bridge can only be connected to the main root bus. Host bridges cannot appear elsewhere in the topology. Note that as of this patch, the ACPI tables needed for the host bridge (specifically, an ACPI object in _SB named ACPI0016 and the CEDT) aren't created. So while this patch internally creates it, it cannot be properly used by an operating system or other system software. Also necessary is to add an exception to scripts/device-crash-test similar to that for exiting pxb as both must created on a PCIexpress host bus. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Jonathan.Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20220429144110.25167-15-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-13cxl: Machine level control on whether CXL support is enabledJonathan Cameron
There are going to be some potential overheads to CXL enablement, for example the host bridge region reserved in memory maps. Add a machine level control so that CXL is disabled by default. Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20220429144110.25167-14-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-13hw/pci/cxl: Create a CXL bus typeBen Widawsky
The easiest way to differentiate a CXL bus, and a PCIE bus is using a flag. A CXL bus, in hardware, is backward compatible with PCIE, and therefore the code tries pretty hard to keep them in sync as much as possible. The other way to implement this would be to try to cast the bus to the correct type. This is less code and useful for debugging via simply looking at the flags. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20220429144110.25167-13-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-13hw/cxl/device: Timestamp implementation (8.2.9.3)Ben Widawsky
Errata F4 to CXL 2.0 clarified the meaning of the timer as the sum of the value set with the timestamp set command and the number of nano seconds since it was last set. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20220429144110.25167-10-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-13hw/cxl/device: Add memory device utilitiesBen Widawsky
Memory devices implement extra capabilities on top of CXL devices. This adds support for that. A large part of memory devices is the mailbox/command interface. All of the mailbox handling is done in the mailbox-utils library. Longer term, new CXL devices that are being emulated may want to handle commands differently, and therefore would need a mechanism to opt in/out of the specific generic handlers. As such, this is considered sufficient for now, but may need more depth in the future. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20220429144110.25167-8-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-13hw/cxl/device: Implement basic mailbox (8.2.8.4)Ben Widawsky
This is the beginning of implementing mailbox support for CXL 2.0 devices. The implementation recognizes when the doorbell is rung, handles the command/payload, clears the doorbell while returning error codes and data. Generally the mailbox mechanism is designed to permit communication between the host OS and the firmware running on the device. For our purposes, we emulate both the firmware, implemented primarily in cxl-mailbox-utils.c, and the hardware. No commands are implemented yet. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20220429144110.25167-7-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-13hw/cxl/device: Implement the CAP array (8.2.8.1-2)Ben Widawsky
This implements all device MMIO up to the first capability. That includes the CXL Device Capabilities Array Register, as well as all of the CXL Device Capability Header Registers. The latter are filled in as they are implemented in the following patches. Endianness and alignment are managed by softmmu memory core. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20220429144110.25167-6-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-13hw/cxl/device: Introduce a CXL device (8.2.8)Ben Widawsky
A CXL device is a type of CXL component. Conceptually, a CXL device would be a leaf node in a CXL topology. From an emulation perspective, CXL devices are the most complex and so the actual implementation is reserved for discrete commits. This new device type is specifically catered towards the eventual implementation of a Type3 CXL.mem device, 8.2.8.5 in the CXL 2.0 specification. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Adam Manzanares <a.manzanares@samsung.com> Message-Id: <20220429144110.25167-5-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-13hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)Ben Widawsky
A CXL 2.0 component is any entity in the CXL topology. All components have a analogous function in PCIe. Except for the CXL host bridge, all have a PCIe config space that is accessible via the common PCIe mechanisms. CXL components are enumerated via DVSEC fields in the extended PCIe header space. CXL components will minimally implement some subset of CXL.mem and CXL.cache registers defined in 8.2.5 of the CXL 2.0 specification. Two headers and a utility library are introduced to support the minimum functionality needed to enumerate components. The cxl_pci header manages bits associated with PCI, specifically the DVSEC and related fields. The cxl_component.h variant has data structures and APIs that are useful for drivers implementing any of the CXL 2.0 components. The library takes care of making use of the DVSEC bits and the CXL.[mem|cache] registers. Per spec, the registers are little endian. None of the mechanisms required to enumerate a CXL capable hostbridge are introduced at this point. Note that the CXL.mem and CXL.cache registers used are always 4B wide. It's possible in the future that this constraint will not hold. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Adam Manzanares <a.manzanares@samsung.com> Message-Id: <20220429144110.25167-3-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-13hw/pci/cxl: Add a CXL component type (interface)Ben Widawsky
A CXL component is a hardware entity that implements CXL component registers from the CXL 2.0 spec (8.2.3). Currently these represent 3 general types. 1. Host Bridge 2. Ports (root, upstream, downstream) 3. Devices (memory, other) A CXL component can be conceptually thought of as a PCIe device with extra functionality when enumerated and enabled. For this reason, CXL does here, and will continue to add on to existing PCI code paths. Host bridges will typically need to be handled specially and so they can implement this newly introduced interface or not. All other components should implement this interface. Implementing this interface allows the core PCI code to treat these devices as special where appropriate. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Adam Manzanares <a.manzanares@samsung.com> Message-Id: <20220429144110.25167-2-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-12Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into stagingRichard Henderson
* small cleanups for pc-bios/optionrom Makefiles * checkpatch: fix g_malloc check * fix mremap() and RDMA detection * confine igd-passthrough-isa-bridge to Xen-enabled builds * cover PCI in arm-virt machine qtests * add -M boot and -M mem compound properties * bump SLIRP submodule * support CFI with system libslirp (>= 4.7) * clean up CoQueue wakeup functions * fix vhost-vsock regression * fix --disable-vnc compilation * other minor bugfixes # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmJ8/KMUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroNTTAf9Et1C8iZn+OlZi99wMEeMy8a4mIE5 # CpkBpFphhkBvt3AH7XNsCyL4Gea4QgsI7nOIEVUwvW7gPf85PiBUX8mjrIVg3x1k # bmMEwMKSTYPmDieAnYBP9zCqZQXNYP8L8WxVs2jFY2GXZ2ZogODYFbvCY4yEEB72 # UR6uIvQRdpiB6BEj8UZ+5i+sDtb0zxqrjzUz8T/PJC9/2JSNgi+sAWWQoQT3PPU7 # R7z2nmEa1VeVLPP6mUHvJKhBltVXF+LyIjQHvo+Tp9tSqp9JwXfFBNQ5W/MFes2D # skF47N7PdgKRH9Dp4r0j+MqBwoAq86+ao+MKsbQ1Gb91HhoCWt/MrVrVyg== # =1E6P # -----END PGP SIGNATURE----- # gpg: Signature made Thu 12 May 2022 05:25:07 AM PDT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [undefined] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [undefined] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (27 commits) vmxcap: add tertiary execution controls vl: make machine type deprecation a warning meson: link libpng independent of vnc vhost-backend: do not depend on CONFIG_VHOST_VSOCK coroutine-lock: qemu_co_queue_restart_all is a coroutine-only qemu_co_enter_all coroutine-lock: introduce qemu_co_queue_enter_all coroutine-lock: qemu_co_queue_next is a coroutine-only qemu_co_enter_next net: slirp: allow CFI with libslirp >= 4.7 net: slirp: add support for CFI-friendly timer API net: slirp: switch to slirp_new net: slirp: introduce a wrapper struct for QemuTimer slirp: bump submodule past 4.7 release machine: move more memory validation to Machine object machine: make memory-backend a link property machine: add mem compound property machine: add boot compound property machine: use QAPI struct for boot configuration tests/qtest/libqos: Add generic pci host bridge in arm-virt machine tests/qtest/libqos: Skip hotplug tests if pci root bus is not hotpluggable tests/qtest/libqos/pci: Introduce pio_limit ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-05-12nbd/server: Allow MULTI_CONN for shared writable exportsEric Blake
According to the NBD spec, a server that advertises NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will not see any cache inconsistencies: when properly separated by a single flush, actions performed by one client will be visible to another client, regardless of which client did the flush. We always satisfy these conditions in qemu - even when we support multiple clients, ALL clients go through a single point of reference into the block layer, with no local caching. The effect of one client is instantly visible to the next client. Even if our backend were a network device, we argue that any multi-path caching effects that would cause inconsistencies in back-to-back actions not seeing the effect of previous actions would be a bug in that backend, and not the fault of caching in qemu. As such, it is safe to unconditionally advertise CAN_MULTI_CONN for any qemu NBD server situation that supports parallel clients. Note, however, that we don't want to advertise CAN_MULTI_CONN when we know that a second client cannot connect (for historical reasons, qemu-nbd defaults to a single connection while nbd-server-add and QMP commands default to unlimited connections; but we already have existing means to let either style of NBD server creation alter those defaults). This is visible by no longer advertising MULTI_CONN for 'qemu-nbd -r' without -e, as in the iotest nbd-qemu-allocation. The harder part of this patch is setting up an iotest to demonstrate behavior of multiple NBD clients to a single server. It might be possible with parallel qemu-io processes, but I found it easier to do in python with the help of libnbd, and help from Nir and Vladimir in writing the test. Signed-off-by: Eric Blake <eblake@redhat.com> Suggested-by: Nir Soffer <nsoffer@redhat.com> Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Message-Id: <20220512004924.417153-3-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-05-12qemu-nbd: Pass max connections to blockdev layerEric Blake
The next patch wants to adjust whether the NBD server code advertises MULTI_CONN based on whether it is known if the server limits to exactly one client. For a server started by QMP, this information is obtained through nbd_server_start (which can support more than one export); but for qemu-nbd (which supports exactly one export), it is controlled only by the command-line option -e/--shared. Since we already have a hook function used by qemu-nbd, it's easiest to just alter its signature to fit our needs. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20220512004924.417153-2-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-05-12coroutine-lock: qemu_co_queue_restart_all is a coroutine-only qemu_co_enter_allPaolo Bonzini
qemu_co_queue_restart_all is basically the same as qemu_co_enter_all but without a QemuLockable argument. That's perfectly fine, but only as long as the function is marked coroutine_fn. If used outside coroutine context, qemu_co_queue_wait will attempt to take the lock and that is just broken: if you are calling qemu_co_queue_restart_all outside coroutine context, the lock is going to be a QemuMutex which cannot be taken twice by the same thread. The patch adds the marker to qemu_co_queue_restart_all and to its sole non-coroutine_fn caller; it then reimplements the function in terms of qemu_co_enter_all_impl, to remove duplicated code and to clarify that the latter also works in coroutine context. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20220427130830.150180-4-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12coroutine-lock: introduce qemu_co_queue_enter_allPaolo Bonzini
Because qemu_co_queue_restart_all does not release the lock, it should be used only in coroutine context. Introduce a new function that, like qemu_co_enter_next, does release the lock, and use it whenever qemu_co_queue_restart_all was used outside coroutine context. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20220427130830.150180-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12coroutine-lock: qemu_co_queue_next is a coroutine-only qemu_co_enter_nextPaolo Bonzini
qemu_co_queue_next is basically the same as qemu_co_enter_next but without a QemuLockable argument. That's perfectly fine, but only as long as the function is marked coroutine_fn. If used outside coroutine context, qemu_co_queue_wait will attempt to take the lock and that is just broken: if you are calling qemu_co_queue_next outside coroutine context, the lock is going to be a QemuMutex which cannot be taken twice by the same thread. The patch adds the marker and reimplements qemu_co_queue_next in terms of qemu_co_enter_next_impl, to remove duplicated code and to clarify that the latter also works in coroutine context. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20220427130830.150180-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12machine: make memory-backend a link propertyPaolo Bonzini
Handle HostMemoryBackend creation and setting of ms->ram entirely in machine_run_board_init. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414165300.555321-5-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12machine: add boot compound propertyPaolo Bonzini
Make -boot syntactic sugar for a compound property "-machine boot.{order,menu,...}". machine_boot_parse is replaced by the setter for the property. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414165300.555321-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12machine: use QAPI struct for boot configurationPaolo Bonzini
As part of converting -boot to a property with a QAPI type, define the struct and use it throughout QEMU to access boot configuration. machine_boot_parse takes care of doing the QemuOpts->QAPI conversion by hand, for now. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414165300.555321-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12coroutine: Rename qemu_coroutine_inc/dec_pool_size()Kevin Wolf
It's true that these functions currently affect the batch size in which coroutines are reused (i.e. moved from the global release pool to the allocation pool of a specific thread), but this is a bug and will be fixed in a separate patch. In fact, the comment in the header file already just promises that it influences the pool size, so reflect this in the name of the functions. As a nice side effect, the shorter function name makes some line wrapping unnecessary. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20220510151020.105528-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-05-12hw/xen/xen_pt: Confine igd-passthrough-isa-bridge to XENBernhard Beschow
igd-passthrough-isa-bridge is only requested in xen_pt but was implemented in pc_piix.c. This caused xen_pt to dependend on i386/pc which is hereby resolved. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Message-Id: <20220326165825.30794-2-shentey@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-11Clean up decorations and whitespace around header guardsMarkus Armbruster
Cleaned up with scripts/clean-header-guards.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-5-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2022-05-11Normalize header guard symbol definitionMarkus Armbruster
We commonly define the header guard symbol without an explicit value. Normalize the exceptions. Done with scripts/clean-header-guards.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-4-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2022-05-11Clean up ill-advised or unusual header guardsMarkus Armbruster
Leading underscores are ill-advised because such identifiers are reserved. Trailing underscores are merely ugly. Strip both. Our header guards commonly end in _H. Normalize the exceptions. Macros should be ALL_CAPS. Normalize the exception. Done with scripts/clean-header-guards.pl. include/hw/xen/interface/ and tools/virtiofsd/ left alone, because these were imported from Xen and libfuse respectively. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-3-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2022-05-11Clean up header guards that don't match their file nameMarkus Armbruster
Header guard symbols should match their file name to make guard collisions less likely. Cleaned up with scripts/clean-header-guards.pl, followed by some renaming of new guard symbols picked by the script to better ones. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-2-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> [Change to generated file ebpf/rss.bpf.skeleton.h backed out]
2022-05-09Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into stagingRichard Henderson
Pull request - Add new thread-pool-min/thread-pool-max parameters to control the thread pool used for async I/O. - Fix virtio-scsi IOThread 100% CPU consumption QEMU 7.0 regression. # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmJ5DqgACgkQnKSrs4Gr # c8iAqAf/WEJzEso0Hu3UUYJi2lAXpLxWPjoNBlPdQlKIJ/I0zQIF0P7GeCifF+0l # iMjgBv0ofyAuV47gaTJlVrAR75+hJ/IXNDhnu3UuvNWfVOqvksgw6kuHkMo9A2hC # 4tIHEU9J8jbQSSdQTaZR8Zj4FX1/zcxMBAXT3YO3De6zo78RatBTuNP4dsZzt8bI # Qs1a4A0p2ScNXK8EcF4QwAWfoxu9OPPzN52DBCNxcIcnn0SUab4NbDxzpRV4ZhDP # 08WoafI5O+2Kb36QysJN01LqajHrClG/fozrPzBLq5aZUK3xewJGB1hEdGTLkkmz # NJNBg5Ldszwj4PDZ1dFU3/03aigb3g== # =t5eR # -----END PGP SIGNATURE----- # gpg: Signature made Mon 09 May 2022 05:52:56 AM PDT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] * tag 'block-pull-request' of https://gitlab.com/stefanha/qemu: virtio-scsi: move request-related items from .h to .c virtio-scsi: clean up virtio_scsi_handle_cmd_vq() virtio-scsi: clean up virtio_scsi_handle_ctrl_vq() virtio-scsi: clean up virtio_scsi_handle_event_vq() virtio-scsi: don't waste CPU polling the event virtqueue virtio-scsi: fix ctrl and event handler functions in dataplane mode util/event-loop-base: Introduce options to set the thread pool size util/main-loop: Introduce the main loop into QOM Introduce event-loop-base abstract class Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-05-09virtio-scsi: move request-related items from .h to .cStefan Hajnoczi
There is no longer a need to expose the request and related APIs in virtio-scsi.h since there are no callers outside virtio-scsi.c. Note the block comment in VirtIOSCSIReq has been adjusted to meet the coding style. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20220427143541.119567-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-09virtio-scsi: clean up virtio_scsi_handle_cmd_vq()Stefan Hajnoczi
virtio_scsi_handle_cmd_vq() is only called from hw/scsi/virtio-scsi.c now and its return value is no longer used. Remove the function prototype from virtio-scsi.h and drop the return value. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20220427143541.119567-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-09virtio-scsi: clean up virtio_scsi_handle_ctrl_vq()Stefan Hajnoczi
virtio_scsi_handle_ctrl_vq() is only called from hw/scsi/virtio-scsi.c now and its return value is no longer used. Remove the function prototype from virtio-scsi.h and drop the return value. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20220427143541.119567-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-09virtio-scsi: clean up virtio_scsi_handle_event_vq()Stefan Hajnoczi
virtio_scsi_handle_event_vq() is only called from hw/scsi/virtio-scsi.c now and its return value is no longer used. Remove the function prototype from virtio-scsi.h and drop the return value. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20220427143541.119567-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-09virtio-scsi: don't waste CPU polling the event virtqueueStefan Hajnoczi
The virtio-scsi event virtqueue is not emptied by its handler function. This is typical for rx virtqueues where the device uses buffers when some event occurs (e.g. a packet is received, an error condition happens, etc). Polling non-empty virtqueues wastes CPU cycles. We are not waiting for new buffers to become available, we are waiting for an event to occur, so it's a misuse of CPU resources to poll for buffers. Introduce the new virtio_queue_aio_attach_host_notifier_no_poll() API, which is identical to virtio_queue_aio_attach_host_notifier() except that it does not poll the virtqueue. Before this patch the following command-line consumed 100% CPU in the IOThread polling and calling virtio_scsi_handle_event(): $ qemu-system-x86_64 -M accel=kvm -m 1G -cpu host \ --object iothread,id=iothread0 \ --device virtio-scsi-pci,iothread=iothread0 \ --blockdev file,filename=test.img,aio=native,cache.direct=on,node-name=drive0 \ --device scsi-hd,drive=drive0 After this patch CPU is no longer wasted. Reported-by: Nir Soffer <nsoffer@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Nir Soffer <nsoffer@redhat.com> Message-id: 20220427143541.119567-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-09util/event-loop-base: Introduce options to set the thread pool sizeNicolas Saenz Julienne
The thread pool regulates itself: when idle, it kills threads until empty, when in demand, it creates new threads until full. This behaviour doesn't play well with latency sensitive workloads where the price of creating a new thread is too high. For example, when paired with qemu's '-mlock', or using safety features like SafeStack, creating a new thread has been measured take multiple milliseconds. In order to mitigate this let's introduce a new 'EventLoopBase' property to set the thread pool size. The threads will be created during the pool's initialization or upon updating the property's value, remain available during its lifetime regardless of demand, and destroyed upon freeing it. A properly characterized workload will then be able to configure the pool to avoid any latency spikes. Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20220425075723.20019-4-nsaenzju@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-09util/main-loop: Introduce the main loop into QOMNicolas Saenz Julienne
'event-loop-base' provides basic property handling for all 'AioContext' based event loops. So let's define a new 'MainLoopClass' that inherits from it. This will permit tweaking the main loop's properties through qapi as well as through the command line using the '-object' keyword[1]. Only one instance of 'MainLoopClass' might be created at any time. 'EventLoopBaseClass' learns a new callback, 'can_be_deleted()' so as to mark 'MainLoop' as non-deletable. [1] For example: -object main-loop,id=main-loop,aio-max-batch=<value> Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20220425075723.20019-3-nsaenzju@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-09Introduce event-loop-base abstract classNicolas Saenz Julienne
Introduce the 'event-loop-base' abstract class, it'll hold the properties common to all event loops and provide the necessary hooks for their creation and maintenance. Then have iothread inherit from it. EventLoopBaseClass is defined as user creatable and provides a hook for its children to attach themselves to the user creatable class 'complete' function. It also provides an update_params() callback to propagate property changes onto its children. The new 'event-loop-base' class will live in the root directory. It is built on its own using the 'link_whole' option (there are no direct function dependencies between the class and its children, it all happens trough 'constructor' magic). And also imposes new compilation dependencies: qom <- event-loop-base <- blockdev (iothread.c) And in subsequent patches: qom <- event-loop-base <- qemuutil (util/main-loop.c) All this forced some amount of reordering in meson.build: - Moved qom build definition before qemuutil. Doing it the other way around (i.e. moving qemuutil after qom) isn't possible as a lot of core libraries that live in between the two depend on it. - Process the 'hw' subdir earlier, as it introduces files into the 'qom' source set. No functional changes intended. Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20220425075723.20019-2-nsaenzju@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-09Warn user if the vga flag is passed but no vga device is createdGautam Agrawal
A global boolean variable "vga_interface_created"(declared in softmmu/globals.c) has been used to track the creation of vga interface. If the vga flag is passed in the command line "default_vga"(declared in softmmu/vl.c) variable is set to 0. To warn user, the condition checks if vga_interface_created is false and default_vga is equal to 0. If "-vga none" is passed, this patch will not warn the user regarding the creation of VGA device. The warning "A -vga option was passed but this machine type does not use that option; no VGA device has been created" is logged if vga flag is passed but no vga device is created. This patch has been tested for x86_64, i386, sparc, sparc64 and arm boards. Signed-off-by: Gautam Agrawal <gautamnagrawal@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/581 Message-Id: <20220501122505.29202-1-gautamnagrawal@gmail.com> [thuth: Fix wrong warning with "-device" in some cases as reported by Paolo] Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-05-09disas: Remove old libopcode ppc disassemblerThomas Huth
Capstone should be superior to the old libopcode disassembler, so we can drop the old file nowadays. Message-Id: <20220505173619.488350-1-thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-05-09disas: Remove old libopcode i386 disassemblerThomas Huth
Capstone should be superior to the old libopcode disassembler, so we can drop the old file nowadays. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220412165836.355850-4-thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-05-09disas: Remove old libopcode arm disassemblerThomas Huth
Capstone should be superior to the old libopcode disassembler, so we can drop the old file nowadays. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220412165836.355850-3-thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-05-08lasi: move from hw/hppa to hw/miscMark Cave-Ayland
Move the LASI device implementation from hw/hppa to hw/misc so that it is located with all the other miscellaneous devices. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20220504092600.10048-43-mark.cave-ayland@ilande.co.uk> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2022-05-08dino: move from hw/hppa to hw/pci-hostMark Cave-Ayland
Move the DINO device implementation from hw/hppa to hw/pci-host so that it is located with all the other PCI host bridges. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20220504092600.10048-23-mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2022-05-07pc: remove -soundhw pcspkPaolo Bonzini
The pcspk device is the only user of the init_isa function, and the only -soundhw option which does not create a new device (it hacks into the PCSpkState by hand). Remove it, since it was deprecated. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-07build: move vhost-scsi configuration to KconfigPaolo Bonzini
vhost-scsi and vhost-user-scsi are two devices of their own; it should be possible to enable/disable them with --without-default-devices, not --without-default-features. Compute their default value in Kconfig to obtain the more intuitive behavior. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-06vfio/common: Rename VFIOGuestIOMMU::iommu into ::iommu_mrYi Liu
Rename VFIOGuestIOMMU iommu field into iommu_mr. Then it becomes clearer it is an IOMMU memory region. no functional change intended Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20220502094223.36384-4-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06sysemu: tpm: Add a stub function for TPM_IS_CRBEric Auger
In a subsequent patch, VFIO will need to recognize if a memory region owner is a TPM CRB device. Hence VFIO needs to use TPM_IS_CRB() even if CONFIG_TPM is unset. So let's add a stub function. Signed-off-by: Eric Auger <eric.auger@redhat.com> Suggested-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Stefan Berger <stefanb@linnux.ibm.com> Link: https://lore.kernel.org/r/20220506132510.1847942-2-eric.auger@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-05ppc/xive: Update the state of the External interrupt signalFrederic Barrat
When pulling or pushing an OS context from/to a CPU, we should re-evaluate the state of the External interrupt signal. Otherwise, we can end up catching the External interrupt exception in hypervisor mode, which is unexpected. The problem is best illustrated with the following scenario: 1. an External interrupt is raised while the guest is on the CPU. 2. before the guest can ack the External interrupt, an hypervisor interrupt is raised, for example the Hypervisor Decrementer or Hypervisor Virtualization interrupt. The hypervisor interrupt forces the guest to exit while the External interrupt is still pending. 3. the hypervisor handles the hypervisor interrupt. At this point, the External interrupt is still pending. So it's very likely to be delivered while the hypervisor is running. That's unexpected and can result in an infinite loop where the hypervisor catches the External interrupt, looks for an interrupt in its hypervisor queue, doesn't find any, exits the interrupt handler with the External interrupt still raised, repeat... The fix is simply to always lower the External interrupt signal when pulling an OS context. It means it needs to be raised again when re-pushing the OS context. Fortunately, it's already the case, as we now always call xive_tctx_ipb_update(), which will raise the signal if needed. Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Message-Id: <20220429071620.177142-3-fbarrat@linux.ibm.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-05-04Merge tag 'for-upstream' of git://repo.or.cz/qemu/kevin into stagingRichard Henderson
Block layer patches - Fix and re-enable GLOBAL_STATE_CODE assertions - vhost-user: Fixes for VHOST_USER_ADD/REM_MEM_REG - vmdk: Fix reopening bs->file - coroutine: use QEMU_DEFINE_STATIC_CO_TLS() - docs/qemu-img: Fix list of formats which implement check # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmJyi+YRHGt3b2xmQHJl # ZGhhdC5jb20ACgkQfwmycsiPL9aphxAAt0sXEOWlcIeU87NOk+V30rRiBup0K/HZ # wqsE6e0EMbygmC2aS/xqNu3naQ/TMY6UaoVBWSpf0D3sK2GnWEJW8bjV05ObZBwp # 6QUgqljk1QAAVv0o2/nViAcV8mEW+OzZLveP+qxFRNlNGoJDsbGzWj939SHM13eu # ZD+/GGs/qXL3Gxp6adhOBjxbXYjvxm13F3pVjoyAugjMSqoSuCI0eXu1xkwXNHSP # /wqObH3dQSzIvEXfE/1BOp3ofZwvg+XzeZ6MM4I/lvHDZWuQBfCQcBYKL9mMNWGc # ijFEeolWt7hER50ik4XPvBmbj0jU2nPXQwo1XcFeWX3MSoNsha2jCZsz4LqzadIN # YijGQHmkfDRmG2LSoIGcgM7chdwj88K8pfMnrrTsVEB6Dl4QrK6FjXviL5mG+rcX # 5FbKpgRwm3fmtug7Ttpgm1LJQmwK5A3YPenPH+CC2FoK3Rje46ZoMwQR5PBuHvM1 # rg9RB01eGJQGrw5Rt3VFk7304O/yT2J5m96x6CMejx4CuGK78VpkBC54HixTTh0R # nXxLLZdqawVqwrPP6sE02FEajM931///nhU6fuN/832m3bYUsfM8SZNVqJh5xoex # SK8x/a5HnTIkp7kys3f4juc+jzb4Yvka8IBIZi/gqzoqf8POGKLBCTpEXa3lBWIT # gnCCnWWEdgY= # =ETW/ # -----END PGP SIGNATURE----- # gpg: Signature made Wed 04 May 2022 09:21:26 AM CDT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] * tag 'for-upstream' of git://repo.or.cz/qemu/kevin: coroutine-win32: use QEMU_DEFINE_STATIC_CO_TLS() coroutine: use QEMU_DEFINE_STATIC_CO_TLS() coroutine-ucontext: use QEMU_DEFINE_STATIC_CO_TLS() iotests/reopen-file: Test reopening file child block/vmdk: Fix reopening bs->file iotests: Add regression test for issue 945 Revert "main-loop: Disable GLOBAL_STATE_CODE() assertions" qcow2: Do not reopen data_file in invalidate_cache block: Classify bdrv_get_flags() as I/O function vhost-user: Don't pass file descriptor for VHOST_USER_REM_MEM_REG libvhost-user: Fix extra vu_add/rem_mem_reg reply docs/vhost-user: Clarifications for VHOST_USER_ADD/REM_MEM_REG qemu-img: properly list formats which have consistency check implemented Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-05-04Revert "main-loop: Disable GLOBAL_STATE_CODE() assertions"Hanna Reitz
This reverts commit b1c073490553f80594b903ceedfc7c1aef6b1b19. (We wanted to do so once the 7.1 tree opens, which has happened. The issue reported in https://gitlab.com/qemu-project/qemu/-/issues/945 should be fixed by the preceding patches.) Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220427114057.36651-4-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-05-04block: Classify bdrv_get_flags() as I/O functionHanna Reitz
This function is safe to call in an I/O context, and qcow2_do_open() does so (invoked in an I/O context by qcow2_co_invalidate_cache()). Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220427114057.36651-2-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>