aboutsummaryrefslogtreecommitdiff
path: root/docs/specs/ppc-spapr-xive.rst
blob: 539ce7ca4e903e3082e5ff645e204ed4567a7bc6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
XIVE for sPAPR (pseries machines)
=================================

The POWER9 processor comes with a new interrupt controller
architecture, called XIVE as "eXternal Interrupt Virtualization
Engine". It supports a larger number of interrupt sources and offers
virtualization features which enables the HW to deliver interrupts
directly to virtual processors without hypervisor assistance.

A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9
processors can run under two interrupt modes:

- *Legacy Compatibility Mode*

  the hypervisor provides identical interfaces and similar
  functionality to PAPR+ Version 2.7.  This is the default mode

  It is also referred as *XICS* in QEMU.

- *XIVE native exploitation mode*

  the hypervisor provides new interfaces to manage the XIVE control
  structures, and provides direct control for interrupt management
  through MMIO pages.

Which interrupt modes can be used by the machine is negotiated with
the guest O/S during the Client Architecture Support negotiation
sequence. The two modes are mutually exclusive.

Both interrupt mode share the same IRQ number space. See below for the
layout.

CAS Negotiation
---------------

QEMU advertises the supported interrupt modes in the device tree
property "ibm,arch-vec-5-platform-support" in byte 23 and the OS
Selection for XIVE is indicated in the "ibm,architecture-vec-5"
property byte 23.

The interrupt modes supported by the machine depend on the CPU type
(POWER9 is required for XIVE) but also on the machine property
``ic-mode`` which can be set on the command line. It can take the
following values: ``xics``, ``xive``, ``dual`` and currently ``xics``
is the default but it may change in the future.

The choosen interrupt mode is activated after a reconfiguration done
in a machine reset.

XIVE Device tree properties
---------------------------

The properties for the PAPR interrupt controller node when the *XIVE
native exploitation mode* is selected shoud contain:

- ``device_type``

  value should be "power-ivpe".

- ``compatible``

  value should be "ibm,power-ivpe".

- ``reg``

  contains the base address and size of the thread interrupt
  managnement areas (TIMA), for the User level and for the Guest OS
  level. Only the Guest OS level is taken into account today.

- ``ibm,xive-eq-sizes``

  the size of the event queues. One cell per size supported, contains
  log2 of size, in ascending order.

- ``ibm,xive-lisn-ranges``

  the IRQ interrupt number ranges assigned to the guest for the IPIs.

The root node also exports :

- ``ibm,plat-res-int-priorities``

  contains a list of priorities that the hypervisor has reserved for
  its own use.

IRQ number space
----------------

IRQ Number space of the ``pseries`` machine is 8K wide and is the same
for both interrupt mode. The different ranges are defined as follow :

- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE)
- ``0x1000 .. 0x1000`` 1 EPOW
- ``0x1001 .. 0x1001`` 1 HOTPLUG
- ``0x1100 .. 0x11FF`` 256 VIO devices
- ``0x1200 .. 0x127F`` 32 PHBs devices
- ``0x1280 .. 0x12FF`` unused
- ``0x1300 .. 0x1FFF`` PHB MSIs

Monitoring XIVE
---------------

The state of the XIVE interrupt controller can be queried through the
monitor commands ``info pic``. The output comes in two parts.

First, the state of the thread interrupt context registers is dumped
for each CPU :

::

   (qemu) info pic
   CPU[0000]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR  W2
   CPU[0000]: USER    00   00  00    00   00  00  00   00  00000000
   CPU[0000]:   OS    00   ff  00    00   ff  00  ff   ff  80000400
   CPU[0000]: POOL    00   00  00    00   00  00  00   00  00000000
   CPU[0000]: PHYS    00   00  00    00   00  00  00   ff  00000000
   ...

In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only
the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM
line which is set to the VP identifier.

Then comes the routing information which aggregates the EAS and the
END configuration:

::

   ...
   LISN         PQ    EISN     CPU/PRIO EQ
   00000000 MSI --    00000010   0/6    380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
   00000001 MSI --    00000010   1/6    305/16384 @1fc230000 ^1 [ 80000010 ... ]
   00000002 MSI --    00000010   2/6    220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
   00000003 MSI --    00000010   3/6    201/16384 @1fc390000 ^1 [ 80000010 ... ]
   00000004 MSI -Q  M 00000000
   00000005 MSI -Q  M 00000000
   00000006 MSI -Q  M 00000000
   00000007 MSI -Q  M 00000000
   00001000 MSI --    00000012   0/6    380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
   00001001 MSI --    00000013   0/6    380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
   00001100 MSI --    00000100   1/6    305/16384 @1fc230000 ^1 [ 80000010 ... ]
   00001101 MSI -Q  M 00000000
   00001200 LSI -Q  M 00000000
   00001201 LSI -Q  M 00000000
   00001202 LSI -Q  M 00000000
   00001203 LSI -Q  M 00000000
   00001300 MSI --    00000102   1/6    305/16384 @1fc230000 ^1 [ 80000010 ... ]
   00001301 MSI --    00000103   2/6    220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
   00001302 MSI --    00000104   3/6    201/16384 @1fc390000 ^1 [ 80000010 ... ]

The source information and configuration:

- The ``LISN`` column outputs the interrupt number of the source in
  range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI``
- The ``PQ`` column reflects the state of the PQ bits of the source :

  - ``--`` source is ready to take events
  - ``P-`` an event was sent and an EOI is PENDING
  - ``PQ`` an event was QUEUED
  - ``-Q`` source is OFF

  a ``M`` indicates that source is *MASKED* at the EAS level,

The targeting configuration :

- The ``EISN`` column is the event data that will be queued in the event
  queue of the O/S.
- The ``CPU/PRIO`` column is the tuple defining the CPU number and
  priority queue serving the source.
- The ``EQ`` column outputs :

  - the current index of the event queue/ the max number of entries
  - the O/S event queue address
  - the toggle bit
  - the last entries that were pushed in the event queue.