diff options
author | Peter Maydell <peter.maydell@linaro.org> | 2019-05-30 15:08:00 +0100 |
---|---|---|
committer | Peter Maydell <peter.maydell@linaro.org> | 2019-05-30 15:08:00 +0100 |
commit | 60905286cb5150de854e08279bca7dfc4b549e91 (patch) | |
tree | 1d168061ed2308a88c0652e52d3227b65a08469b /docs/specs/ppc-xive.rst | |
parent | 48a8b399619cf3bb745a2e052f9fec142f14d75d (diff) | |
parent | ce4b1b56852ea741170ae85d3b8c0771c1ca7c9e (diff) |
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190529' into staging
ppc patch queue 2019-05-29
Next pull request against qemu-4.1. Highlights:
* KVM accelerated support for the XIVE interrupt controller in PAPR
guests
* A number of TCG vector fixes
* Fixes for the PReP / 40p machine
* Improvements to make check-tcg test coverage
Other than that it's just a bunch of assorted fixes, cleanups and
minor improvements.
This supersedes both the pull request dated 2019-05-21 and the one
dated 2019-05-22. I've dropped one hunk which I think may have caused
the check-tcg failure that Peter saw (by enabling the ppc64abi32
build, which I think has been broken for ages). I'm not entirely
certain, since I haven't reproduced exactly the same failure.
# gpg: Signature made Wed 29 May 2019 07:49:04 BST
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-4.1-20190529: (44 commits)
ppc/pnv: add dummy XSCOM registers for PRD initialization
ppc/pnv: introduce new skiboot platform properties
spapr: Don't migrate the hpt_maxpagesize cap to older machine types
spapr: change default interrupt mode to 'dual'
spapr/xive: fix multiple resets when using the 'dual' interrupt mode
docs: provide documentation on the POWER9 XIVE interrupt controller
spapr/irq: add KVM support to the 'dual' machine
ppc/xics: fix irq priority in ics_set_irq_type()
spapr/irq: initialize the IRQ device only once
spapr/irq: introduce a spapr_irq_init_device() helper
spapr: check for the activation of the KVM IRQ device
spapr: introduce routines to delete the KVM IRQ device
sysbus: add a sysbus_mmio_unmap() helper
spapr/xive: activate KVM support
spapr/xive: add migration support for KVM
spapr/xive: introduce a VM state change handler
spapr/xive: add state synchronization with KVM
spapr/xive: add hcall support when under KVM
spapr/xive: add KVM support
spapr: Print out extra hints when CAS negotiation of interrupt mode fails
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Diffstat (limited to 'docs/specs/ppc-xive.rst')
-rw-r--r-- | docs/specs/ppc-xive.rst | 199 |
1 files changed, 199 insertions, 0 deletions
diff --git a/docs/specs/ppc-xive.rst b/docs/specs/ppc-xive.rst new file mode 100644 index 0000000000..b997dc0629 --- /dev/null +++ b/docs/specs/ppc-xive.rst @@ -0,0 +1,199 @@ +================================ +POWER9 XIVE interrupt controller +================================ + +The POWER9 processor comes with a new interrupt controller +architecture, called XIVE as "eXternal Interrupt Virtualization +Engine". + +Compared to the previous architecture, the main characteristics of +XIVE are to support a larger number of interrupt sources and to +deliver interrupts directly to virtual processors without hypervisor +assistance. This removes the context switches required for the +delivery process. + + +XIVE architecture +================= + +The XIVE IC is composed of three sub-engines, each taking care of a +processing layer of external interrupts: + +- Interrupt Virtualization Source Engine (IVSE), or Source Controller + (SC). These are found in PCI PHBs, in the PSI host bridge + controller, but also inside the main controller for the core IPIs + and other sub-chips (NX, CAP, NPU) of the chip/processor. They are + configured to feed the IVRE with events. +- Interrupt Virtualization Routing Engine (IVRE) or Virtualization + Controller (VC). It handles event coalescing and perform interrupt + routing by matching an event source number with an Event + Notification Descriptor (END). +- Interrupt Virtualization Presentation Engine (IVPE) or Presentation + Controller (PC). It maintains the interrupt context state of each + thread and handles the delivery of the external interrupt to the + thread. + +:: + + XIVE Interrupt Controller + +------------------------------------+ IPIs + | +---------+ +---------+ +--------+ | +-------+ + | |IVRE | |Common Q | |IVPE |----> | CORES | + | | esb | | | | |----> | | + | | eas | | Bridge | | tctx |----> | | + | |SC end | | | | nvt | | | | + +------+ | +---------+ +----+----+ +--------+ | +-+-+-+-+ + | RAM | +------------------|-----------------+ | | | + | | | | | | + | | | | | | + | | +--------------------v------------------------v-v-v--+ other + | <--+ Power Bus +--> chips + | esb | +---------+-----------------------+------------------+ + | eas | | | + | end | +--|------+ | + | nvt | +----+----+ | +----+----+ + +------+ |IVSE | | |IVSE | + | | | | | + | PQ-bits | | | PQ-bits | + | local |-+ | in VC | + +---------+ +---------+ + PCIe NX,NPU,CAPI + + + PQ-bits: 2 bits source state machine (P:pending Q:queued) + esb: Event State Buffer (Array of PQ bits in an IVSE) + eas: Event Assignment Structure + end: Event Notification Descriptor + nvt: Notification Virtual Target + tctx: Thread interrupt Context registers + + + +XIVE internal tables +-------------------- + +Each of the sub-engines uses a set of tables to redirect interrupts +from event sources to CPU threads. + +:: + + +-------+ + User or O/S | EQ | + or +------>|entries| + Hypervisor | | .. | + Memory | +-------+ + | ^ + | | + +-------------------------------------------------+ + | | + Hypervisor +------+ +---+--+ +---+--+ +------+ + Memory | ESB | | EAT | | ENDT | | NVTT | + (skiboot) +----+-+ +----+-+ +----+-+ +------+ + ^ | ^ | ^ | ^ + | | | | | | | + +-------------------------------------------------+ + | | | | | | | + | | | | | | | + +----|--|--------|--|--------|--|-+ +-|-----+ +------+ + | | | | | | | | | | tctx| |Thread| + IPI or ---+ + v + v + v |---| + .. |-----> | + HW events | | | | | | + | IVRE | | IVPE | +------+ + +---------------------------------+ +-------+ + + +The IVSE have a 2-bits state machine, P for pending and Q for queued, +for each source that allows events to be triggered. They are stored in +an Event State Buffer (ESB) array and can be controlled by MMIOs. + +If the event is let through, the IVRE looks up in the Event Assignment +Structure (EAS) table for an Event Notification Descriptor (END) +configured for the source. Each Event Notification Descriptor defines +a notification path to a CPU and an in-memory Event Queue, in which +will be enqueued an EQ data for the O/S to pull. + +The IVPE determines if a Notification Virtual Target (NVT) can handle +the event by scanning the thread contexts of the VCPUs dispatched on +the processor HW threads. It maintains the interrupt context state of +each thread in a NVT table. + +XIVE thread interrupt context +----------------------------- + +The XIVE presenter can generate four different exceptions to its +HW threads: + +- hypervisor exception +- O/S exception +- Event-Based Branch (user level) +- msgsnd (doorbell) + +Each exception has a state independent from the others called a Thread +Interrupt Management context. This context is a set of registers which +lets the thread handle priority management and interrupt +acknowledgment among other things. The most important ones being : + +- Interrupt Priority Register (PIPR) +- Interrupt Pending Buffer (IPB) +- Current Processor Priority (CPPR) +- Notification Source Register (NSR) + +TIMA +~~~~ + +The Thread Interrupt Management registers are accessible through a +specific MMIO region, called the Thread Interrupt Management Area +(TIMA), four aligned pages, each exposing a different view of the +registers. First page (page address ending in ``0b00``) gives access +to the entire context and is reserved for the ring 0 view for the +physical thread context. The second (page address ending in ``0b01``) +is for the hypervisor, ring 1 view. The third (page address ending in +``0b10``) is for the operating system, ring 2 view. The fourth (page +address ending in ``0b11``) is for user level, ring 3 view. + +Interrupt flow from an O/S perspective +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +After an event data has been enqueued in the O/S Event Queue, the IVPE +raises the bit corresponding to the priority of the pending interrupt +in the register IBP (Interrupt Pending Buffer) to indicate that an +event is pending in one of the 8 priority queues. The Pending +Interrupt Priority Register (PIPR) is also updated using the IPB. This +register represent the priority of the most favored pending +notification. + +The PIPR is then compared to the the Current Processor Priority +Register (CPPR). If it is more favored (numerically less than), the +CPU interrupt line is raised and the EO bit of the Notification Source +Register (NSR) is updated to notify the presence of an exception for +the O/S. The O/S acknowledges the interrupt with a special load in the +Thread Interrupt Management Area. + +The O/S handles the interrupt and when done, performs an EOI using a +MMIO operation on the ESB management page of the associate source. + +Overview of the QEMU models for XIVE +==================================== + +The XiveSource models the IVSE in general, internal and external. It +handles the source ESBs and the MMIO interface to control them. + +The XiveNotifier is a small helper interface interconnecting the +XiveSource to the XiveRouter. + +The XiveRouter is an abstract model acting as a combined IVRE and +IVPE. It routes event notifications using the EAS and END tables to +the IVPE sub-engine which does a CAM scan to find a CPU to deliver the +exception. Storage should be provided by the inheriting classes. + +XiveEnDSource is a special source object. It exposes the END ESB MMIOs +of the Event Queues which are used for coalescing event notifications +and for escalation. Not used on the field, only to sync the EQ cache +in OPAL. + +Finally, the XiveTCTX contains the interrupt state context of a thread, +four sets of registers, one for each exception that can be delivered +to a CPU. These contexts are scanned by the IVPE to find a matching VP +when a notification is triggered. It also models the Thread Interrupt +Management Area (TIMA), which exposes the thread context registers to +the CPU for interrupt management. |