aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-01-04block/nbd-client: use traces instead of noisy error_report_errVladimir Sementsov-Ogievskiy
Reduce extra noise of nbd-client, change 083 correspondingly. In various commits (be41c100 in 2.10, f140e300 in 2.11, 78a33ab in 2.12), we added spots where qemu as an NBD client would report problems communicating with the server to stderr, because there was no where else to send the error to. However, this is racy, particularly since the most common source of these errors is when either the client or the server abruptly hangs up, leaving one coroutine to report the error only if it wins (or loses) the race in attempting the read from the server before another thread completes its cleanup of a protocol error that caused the disconnect in the first place. The race is also apparent in the fact that differences in the flush behavior of the server can alter the frequency of encountering the race in the client (see commit 6d39db96). Rather than polluting stderr, it's better to just trace these situations, for use by developers debugging a flaky connection, particularly since the real error that either triggers the abrupt disconnection in the first place, or that results from the EIO when a request can't receive a reply, DOES make it back to the user in the normal Error propagation channels. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20181102151152.288399-4-vsementsov@virtuozzo.com> [eblake: drop depedence on error hint, enhance commit message] Signed-off-by: Eric Blake <eblake@redhat.com>
2019-01-04nbd/client: Trace all server option error messagesEric Blake
Not all servers send free-form text alongside option error replies, but for servers that do (such as qemu), we pass the server's message as a hint alongside our own error reporting. However, it would also be useful to trace such server messages, since we can't guarantee how the hint may be consumed. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20181218225714.284495-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-04nbd: publish _lookup functionsVladimir Sementsov-Ogievskiy
These functions are used for formatting pretty trace points. We are going to add some in block/nbd-client, so, let's publish all these functions at once. Note, that nbd_reply_type_lookup is already published, and constants, "named" by these functions live in include/block/nbd.h too. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20181102151152.288399-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2018-12-22Merge remote-tracking branch 'remotes/marcel/tags/rdma-pull-request' into ↵Peter Maydell
staging RDMA queue * Add support for RDMA MAD * Various fixes for the pvrdma backend # gpg: Signature made Sat 22 Dec 2018 09:36:36 GMT # gpg: using RSA key 36D4C0F0CF2FE46D # gpg: Good signature from "Marcel Apfelbaum <marcel.apfelbaum@zoho.com>" # gpg: aka "Marcel Apfelbaum <marcel@redhat.com>" # gpg: aka "Marcel Apfelbaum <marcel.apfelbaum@gmail.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: B1C6 3A57 F92E 08F2 640F 31F5 36D4 C0F0 CF2F E46D * remotes/marcel/tags/rdma-pull-request: (31 commits) pvrdma: check return value from pvrdma_idx_ring_has_ routines rdma: remove unused VENDOR_ERR_NO_SGE macro pvrdma: release ring object in case of an error pvrdma: check number of pages when creating rings pvrdma: add uar_read routine rdma: check num_sge does not exceed MAX_SGE pvrdma: release device resources in case of an error docs: Update pvrdma device documentation hw/rdma: Do not call rdma_backend_del_gid on an empty gid hw/rdma: Do not use bitmap_zero_extend to free bitmap hw/pvrdma: Clean device's resource when system is shutdown vl: Introduce shutdown_notifiers hw/rdma: Remove unneeded code that handles more that one port hw/pvrdma: Fill error code in command's response hw/pvrdma: Fill all CQE fields hw/pvrdma: Make device state depend on Ethernet function state hw/rdma: Initialize node_guid from vmxnet3 mac address hw/pvrdma: Make sure PCI function 0 is vmxnet3 vmxnet3: Move some definitions to header file hw/pvrdma: Add support to allow guest to configure GID table ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-12-22pvrdma: check return value from pvrdma_idx_ring_has_ routinesPrasad J Pandit
pvrdma_idx_ring_has_[data/space] routines also return invalid index PVRDMA_INVALID_IDX[=-1], if ring has no data/space. Check return value from these routines to avoid plausible infinite loops. Reported-by: Li Qiang <liq3ea@163.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22rdma: remove unused VENDOR_ERR_NO_SGE macroPrasad J Pandit
With commit 4481985c (rdma: check num_sge does not exceed MAX_SGE) macro VENDOR_ERR_NO_SGE is no longer in use - delete it. Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22pvrdma: release ring object in case of an errorPrasad J Pandit
create_cq and create_qp routines allocate ring object, but it's not released in case of an error, leading to memory leakage. Reported-by: Li Qiang <liq3ea@163.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22pvrdma: check number of pages when creating ringsPrasad J Pandit
When creating CQ/QP rings, an object can have up to PVRDMA_MAX_FAST_REG_PAGES 8 pages. Check 'npages' parameter to avoid excessive memory allocation or a null dereference. Reported-by: Li Qiang <liq3ea@163.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22pvrdma: add uar_read routinePrasad J Pandit
Define skeleton 'uar_read' routine. Avoid NULL dereference. Reported-by: Li Qiang <liq3ea@163.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22rdma: check num_sge does not exceed MAX_SGEPrasad J Pandit
rdma back-end has scatter/gather array ibv_sge[MAX_SGE=4] set to have 4 elements. A guest could send a 'PvrdmaSqWqe' ring element with 'num_sge' set to > MAX_SGE, which may lead to OOB access issue. Add check to avoid it. Reported-by: Saar Amar <saaramar5@gmail.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22pvrdma: release device resources in case of an errorPrasad J Pandit
If during pvrdma device initialisation an error occurs, pvrdma_realize() does not release memory resources, leading to memory leakage. Reported-by: Li Qiang <liq3ea@163.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20181212175817.815-1-ppandit@redhat.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22docs: Update pvrdma device documentationYuval Shaia
Interface with the device is changed with the addition of support for MAD packets. Adjust documentation accordingly. While there fix a minor mistake which may lead to think that there is a relation between using RXE on host and the compatibility with bare-metal peers. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum<marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/rdma: Do not call rdma_backend_del_gid on an empty gidYuval Shaia
When device goes down the function fini_ports loops over all entries in gid table regardless of the fact whether entry is valid or not. In case that entry is not valid we'd like to skip from any further processing in backend device. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/rdma: Do not use bitmap_zero_extend to free bitmapYuval Shaia
bitmap_zero_extend is designed to work for extending, not for shrinking. Using g_free instead. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/pvrdma: Clean device's resource when system is shutdownYuval Shaia
In order to clean some external resources such as GIDs, QPs etc, register to receive notification when VM is shutdown. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22vl: Introduce shutdown_notifiersYuval Shaia
Notifier will be used for signaling shutdown event to inform system is shutdown. This will allow devices and other component to run some cleanup code needed before VM is shutdown. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/rdma: Remove unneeded code that handles more that one portYuval Shaia
Device supports only one port, let's remove a dead code that handles more than one port. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/pvrdma: Fill error code in command's responseYuval Shaia
Driver checks error code let's set it. In addition, for code simplification purposes, set response's fields ack, response and err outside of the scope of command handlers. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/pvrdma: Fill all CQE fieldsYuval Shaia
Add ability to pass specific WC attributes to CQE such as GRH_BIT flag. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/pvrdma: Make device state depend on Ethernet function stateYuval Shaia
User should be able to control the device by changing Ethernet function state so if user runs 'ifconfig ens3 down' the PVRDMA function should be down as well. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/rdma: Initialize node_guid from vmxnet3 mac addressYuval Shaia
node_guid should be set once device is load. Make node_guid be GID format (32 bit) of PCI function 0 vmxnet3 device's MAC. A new function was added to do the conversion. So for example the MAC 56:b6:44:e9:62:dc will be converted to GID 54b6:44ff:fee9:62dc. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/pvrdma: Make sure PCI function 0 is vmxnet3Yuval Shaia
Guest driver enforces it, we should also. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22vmxnet3: Move some definitions to header fileYuval Shaia
pvrdma setup requires vmxnet3 device on PCI function 0 and PVRDMA device on PCI function 1. pvrdma device needs to access vmxnet3 device object for several reasons: 1. Make sure PCI function 0 is vmxnet3. 2. To monitor vmxnet3 device state. 3. To configure node_guid accoring to vmxnet3 device's MAC address. To be able to access vmxnet3 device the definition of VMXNET3State is moved to a new header file. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Dmitry Fleytman <dmitry.fleytman@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/pvrdma: Add support to allow guest to configure GID tableYuval Shaia
The control over the RDMA device's GID table is done by updating the device's Ethernet function addresses. Usually the first GID entry is determined by the MAC address, the second by the first IPv6 address and the third by the IPv4 address. Other entries can be added by adding more IP addresses. The opposite is the same, i.e. whenever an address is removed, the corresponding GID entry is removed. The process is done by the network and RDMA stacks. Whenever an address is added the ib_core driver is notified and calls the device driver add_gid function which in turn update the device. To support this in pvrdma device we need to hook into the create_bind and destroy_bind HW commands triggered by pvrdma driver in guest. Whenever a change is made to the pvrdma port's GID table a special QMP message is sent to be processed by libvirt to update the address of the backend Ethernet device. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum<marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22qapi: Define new QMP message for pvrdmaYuval Shaia
pvrdma requires that the same GID attached to it will be attached to the backend device in the host. A new QMP messages is defined so pvrdma device can broadcast any change made to its GID table. This event is captured by libvirt which in turn will update the GID table in the backend device. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/pvrdma: Set the correct opcode for send completionYuval Shaia
opcode for WC should be set by the device and not taken from work element. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/pvrdma: Set the correct opcode for recv completionYuval Shaia
The function pvrdma_post_cqe populates CQE entry with opcode from the given completion element. For receive operation value was not set. Fix it by setting it to IBV_WC_RECV. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/pvrdma: Make default pkey 0xFFFFYuval Shaia
Commit 6e7dba23af ("hw/pvrdma: Make default pkey 0xFFFF") exports default pkey as external definition but omit the change from 0x7FFF to 0xFFFF. Fixes: 6e7dba23af ("hw/pvrdma: Make default pkey 0xFFFF") Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/pvrdma: Make function reset_device return voidYuval Shaia
This function cannot fail - fix it to return void Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/rdma: Add support for MAD packetsYuval Shaia
MAD (Management Datagram) packets are widely used by various modules both in kernel and in user space for example the rdma_* API which is used to create and maintain "connection" layer on top of RDMA uses several types of MAD packets. For more information please refer to chapter 13.4 in Volume 1 Architecture Specification, Release 1.1 available here: https://www.infinibandta.org/ibta-specifications-download/ To support MAD packets the device uses an external utility (contrib/rdmacm-mux) to relay packets from and to the guest driver. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum<marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/rdma: Abort send-op if fail to create addr handlerYuval Shaia
Function create_ah might return NULL, let's exit with an error. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/rdma: Return qpn 1 if ibqp is NULLYuval Shaia
Device is not supporting QP0, only QP1. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/rdma: Add ability to force notification without re-armYuval Shaia
Upon completion of incoming packet the device pushes CQE to driver's RX ring and notify the driver (msix). While for data-path incoming packets the driver needs the ability to control whether it wished to receive interrupts or not, for control-path packets such as incoming MAD the driver needs to be notified anyway, it even do not need to re-arm the notification bit. Enhance the notification field to support this. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22contrib/rdmacm-mux: Add implementation of RDMA User MAD multiplexerYuval Shaia
RDMA MAD kernel module (ibcm) disallow more than one MAD-agent for a given MAD class. This does not go hand-by-hand with qemu pvrdma device's requirements where each VM is MAD agent. Fix it by adding implementation of RDMA MAD multiplexer service which on one hand register as a sole MAD agent with the kernel module and on the other hand gives service to more than one VM. Design Overview: Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> ---------------- A server process is registered to UMAD framework (for this to work the rdma_cm kernel module needs to be unloaded) and creates a unix socket to listen to incoming request from clients. A client process (such as QEMU) connects to this unix socket and registers with its own GID. TX: ---- When client needs to send rdma_cm MAD message it construct it the same way as without this multiplexer, i.e. creates a umad packet but this time it writes its content to the socket instead of calling umad_send(). The server, upon receiving such a message fetch local_comm_id from it so a context for this session can be maintain and relay the message to UMAD layer by calling umad_send(). RX: ---- The server creates a worker thread to process incoming rdma_cm MAD messages. When an incoming message arrived (umad_recv()) the server, depending on the message type (attr_id) looks for target client by either searching in gid->fd table or in local_comm_id->fd table. With the extracted fd the server relays to incoming message to the client. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-22hw/pvrdma: Check the correct return valueYuval Shaia
Return value of 0 means ok, we want to free the memory only in case of error. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Message-Id: <20181025061700.17050-1-yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum<marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2018-12-21Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20181221' ↵Peter Maydell
into staging ppc patch queue 2018-12-21 This pull request supersedes the one from 2018-12-13. This is a revised first ppc pull request for qemu-4.0. Highlights are: * Most of the code for the POWER9 "XIVE" interrupt controller (not complete yet, but we're getting there) * A number of g_new vs. g_malloc cleanups * Some IRQ wiring cleanups * A fix for how we advertise NUMA nodes to the guest for pseries # gpg: Signature made Fri 21 Dec 2018 05:34:12 GMT # gpg: using RSA key 6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-4.0-20181221: (40 commits) MAINTAINERS: PPC: add a XIVE section spapr: change default CPU type to POWER9 spapr: introduce an 'ic-mode' machine option spapr: add an extra OV5 field to the sPAPR IRQ backend spapr: add a 'reset' method to the sPAPR IRQ backend spapr: extend the sPAPR IRQ backend for XICS migration spapr: allocate the interrupt thread context under the CPU core spapr: add device tree support for the XIVE exploitation mode spapr: add hcalls support for the XIVE exploitation interrupt mode spapr: introduce a new machine IRQ backend for XIVE spapr-iommu: Always advertise the maximum possible DMA window size spapr/xive: use the VCPU id as a NVT identifier spapr/xive: introduce a XIVE interrupt controller ppc/xive: notify the CPU when the interrupt priority is more privileged ppc/xive: introduce a simplified XIVE presenter ppc/xive: introduce the XIVE interrupt thread context ppc/xive: add support for the END Event State Buffers Changes requirement for "vsubsbs" instruction spapr: export and rename the xics_max_server_number() routine spapr: introduce a spapr_irq_init() routine ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-12-21Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
pci, pc, virtio: fixes, features VTD fixes IR and split irqchip are now the default for Q35 ACPI refactoring hotplug refactoring new names for virtio devices multiple pcie link width/speeds PCI fixes Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Thu 20 Dec 2018 18:26:03 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (44 commits) x86-iommu: turn on IR by default if proper x86-iommu: switch intr_supported to OnOffAuto type q35: set split kernel irqchip as default pci: Adjust PCI config limit based on bus topology spapr_pci: perform unplug via the hotplug handler pci/shpc: perform unplug via the hotplug handler pci: Reuse pci-bridge hotplug handler handlers for pcie-pci-bridge pci/pcie: perform unplug via the hotplug handler pci/pcihp: perform unplug via the hotplug handler pci/pcihp: overwrite hotplug handler recursively from the start pci/pcihp: perform check for bus capability in pre_plug handler s390x/pci: rename hotplug handler callbacks pci/shpc: rename hotplug handler callbacks pci/pcie: rename hotplug handler callbacks hw/i386: Remove deprecated machines pc-0.10 and pc-0.11 hw: acpi: Remove AcpiRsdpDescriptor and fix tests hw: acpi: Export and share the ARM RSDP build hw: arm: Support both legacy and current RSDP build hw: arm: Convert the RSDP build to the buid_append_foo() API hw: arm: Carry RSDP specific data through AcpiRsdpData ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-12-21MAINTAINERS: PPC: add a XIVE sectionCédric Le Goater
Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: change default CPU type to POWER9Cédric Le Goater
Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: introduce an 'ic-mode' machine optionCédric Le Goater
This option is used to select the interrupt controller mode (XICS or XIVE) with which the machine will operate. XICS being the default mode for now. When running a machine with the XIVE interrupt mode backend, the guest OS is required to have support for the XIVE exploitation mode. In the case of legacy OS, the mode selected by CAS should be XICS and the OS should fail to boot. However, QEMU could possibly detect it, terminate the boot process and reset to stop in the SLOF firmware. This is not yet handled. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: add an extra OV5 field to the sPAPR IRQ backendCédric Le Goater
The interrupt modes supported by the hypervisor are advertised to the guest with new bits definitions of the option vector 5 of property "ibm,arch-vec-5-platform-support. The byte 23 bits 0-1 of the OV5 are defined as follow : 0b00 PAPR 2.7 and earlier (Legacy systems) 0b01 XIVE Exploitation mode only 0b10 Either available If the client/guest selects the XIVE interrupt mode, it informs the hypervisor by returning the value 0b01 in byte 23 bits 0-1. A 0b00 value indicates the use of the XICS interrupt mode (Legacy systems). The sPAPR IRQ backend is extended with these definitions and the values are directly used to populate the "ibm,arch-vec-5-platform-support" property. The interrupt mode is advertised under TCG and under KVM. Although a KVM XIVE device is not yet available, the machine can still operate with kernel_irqchip=off. However, we apply a restriction on the CPU which is required to be a POWER9 when a XIVE interrupt controller is in use. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: add a 'reset' method to the sPAPR IRQ backendCédric Le Goater
For the time being, the XIVE reset handler updates the OS CAM line of the vCPU as it is done under a real hypervisor when a vCPU is scheduled to run on a HW thread. This will let the XIVE presenter engine find a match among the NVTs dispatched on the HW threads. This handler will become even more useful when we introduce the machine supporting both interrupt modes, XIVE and XICS. In this machine, the interrupt mode is chosen by the CAS negotiation process and activated after a reset. Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: extend the sPAPR IRQ backend for XICS migrationCédric Le Goater
Introduce a new sPAPR IRQ handler to handle resend after migration when the machine is using a KVM XICS interrupt controller model. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: allocate the interrupt thread context under the CPU coreCédric Le Goater
Each interrupt mode has its own specific interrupt presenter object, that we store under the CPU object, one for XICS and one for XIVE. Extend the sPAPR IRQ backend with a new handler to support them both. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: add device tree support for the XIVE exploitation modeCédric Le Goater
The XIVE interface for the guest is described in the device tree under the "interrupt-controller" node. A couple of new properties are specific to XIVE : - "reg" contains the base address and size of the thread interrupt managnement areas (TIMA), for the User level and for the Guest OS level. Only the Guest OS level is taken into account today. - "ibm,xive-eq-sizes" the size of the event queues. One cell per size supported, contains log2 of size, in ascending order. - "ibm,xive-lisn-ranges" the IRQ interrupt number ranges assigned to the guest for the IPIs. and also under the root node : - "ibm,plat-res-int-priorities" contains a list of priorities that the hypervisor has reserved for its own use. OPAL uses the priority 7 queue to automatically escalate interrupts for all other queues (DD2.X POWER9). So only priorities [0..6] are allowed for the guest. Extend the sPAPR IRQ backend with a new handler to populate the DT with the appropriate "interrupt-controller" node. Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: add hcalls support for the XIVE exploitation interrupt modeCédric Le Goater
The different XIVE virtualization structures (sources and event queues) are configured with a set of Hypervisor calls : - H_INT_GET_SOURCE_INFO used to obtain the address of the MMIO page of the Event State Buffer (ESB) entry associated with the source. - H_INT_SET_SOURCE_CONFIG assigns a source to a "target". - H_INT_GET_SOURCE_CONFIG determines which "target" and "priority" is assigned to a source - H_INT_GET_QUEUE_INFO returns the address of the notification management page associated with the specified "target" and "priority". - H_INT_SET_QUEUE_CONFIG sets or resets the event queue for a given "target" and "priority". It is also used to set the notification configuration associated with the queue, only unconditional notification is supported for the moment. Reset is performed with a queue size of 0 and queueing is disabled in that case. - H_INT_GET_QUEUE_CONFIG returns the queue settings for a given "target" and "priority". - H_INT_RESET resets all of the guest's internal interrupt structures to their initial state, losing all configuration set via the hcalls H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG. - H_INT_SYNC issue a synchronisation on a source to make sure all notifications have reached their queue. Calls that still need to be addressed : H_INT_SET_OS_REPORTING_LINE H_INT_GET_OS_REPORTING_LINE See the code for more documentation on each hcall. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Folded in fix for field accessors] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: introduce a new machine IRQ backend for XIVECédric Le Goater
The XIVE IRQ backend uses the same layout as the new XICS backend but covers the full range of the IRQ number space. The IRQ numbers for the CPU IPIs are allocated at the bottom of this space, below 4K, to preserve compatibility with XICS which does not use that range. This should be enough given that the maximum number of CPUs is 1024 for the sPAPR machine under QEMU. For the record, the biggest POWER8 or POWER9 system has a maximum of 1536 HW threads (16 sockets, 192 cores, SMT8). Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr-iommu: Always advertise the maximum possible DMA window sizeAlexey Kardashevskiy
When deciding about the huge DMA window, the typical Linux pseries guest uses the maximum allowed RAM size as the upper limit. We did the same on QEMU side to match that logic. Now we are going to support a GPU RAM pass through which is not available at the guest boot time as it requires the guest driver interaction. As the result, the guest requests a smaller window than it should. Therefore the guest needs to be patched to understand this new memory and so does QEMU. Instead of reimplementing here whatever solution we choose for the guest, this advertises the biggest possible window size limited by 32 bit (as defined by LoPAPR). Since the window size has to be power-of-two (the create rtas call receives a window shift, not a size), this uses 0x8000.0000 as the maximum number of TCEs possible (rather than 32bit maximum of 0xffff.ffff). This is safe as: 1. The guest visible emulated table is allocated in KVM (actual pages are allocated in page fault handler) and QEMU (actual pages are allocated when updated); 2. The hardware table (and corresponding userspace address table) supports sparse allocation and also checks for locked_vm limit so it is unable to cause the host any damage. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr/xive: use the VCPU id as a NVT identifierCédric Le Goater
The IVPE scans the O/S CAM line of the XIVE thread interrupt contexts to find a matching Notification Virtual Target (NVT) among the NVTs dispatched on the HW processor threads. On a real system, the thread interrupt contexts are updated by the hypervisor when a Virtual Processor is scheduled to run on a HW thread. Under QEMU, the model will emulate the same behavior by hardwiring the NVT identifier in the thread context registers at reset. The NVT identifier used by the sPAPRXive model is the VCPU id. The END identifier is also derived from the VCPU id. A set of helpers doing the conversion between identifiers are provided for the hcalls configuring the sources and the ENDs. The model does not need a NVT table but the XiveRouter NVT operations are provided to perform some extra checks in the routing algorithm. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr/xive: introduce a XIVE interrupt controllerCédric Le Goater
sPAPRXive models the XIVE interrupt controller of the sPAPR machine. It inherits from the XiveRouter and provisions storage for the routing tables : - Event Assignment Structure (EAS) - Event Notification Descriptor (END) The sPAPRXive model incorporates an internal XiveSource for the IPIs and for the interrupts of the virtual devices of the guest. This model is consistent with XIVE architecture which also incorporates an internal IVSE for IPIs and accelerator interrupts in the IVRE sub-engine. The sPAPRXive model exports two memory regions, one for the ESB trigger and management pages used to control the sources and one for the TIMA pages. They are mapped by default at the addresses found on chip 0 of a baremetal system. This is also consistent with the XIVE architecture which defines a Virtualization Controller BAR for the internal IVSE ESB pages and a Thread Managment BAR for the TIMA. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Fold in field accessor fixes] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>