aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorPeter Maydell <peter.maydell@linaro.org>2022-01-18 19:43:33 +0000
committerPeter Maydell <peter.maydell@linaro.org>2022-01-18 19:43:33 +0000
commit0dabdd6b3a7ead1183d6f26eaded7d0c332e4cc7 (patch)
tree23a05f5d199c5677fa5573bd0d010f675ef7b52e /docs
parent8b846207151955a7d4de2d33d07645991824e345 (diff)
parentba49190107ee9803fb2f336b15283b457384b178 (diff)
Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220118' into staging
ppc 7.0 queue: * More documentation updates (Leonardo) * Fixes for the 7448 CPU (Fabiano and Cedric) * Final removal of 403 CPUs and the .load_state_old handler (Cedric) * More cleanups of PHB4 models (Daniel and Cedric) # gpg: Signature made Tue 18 Jan 2022 11:59:16 GMT # gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1 # gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1 * remotes/legoater/tags/pull-ppc-20220118: (31 commits) ppc/pnv: Remove PHB4 version property ppc/pnv: Add a 'rp_model' class attribute for the PHB4 PEC ppc/pnv: Move root port allocation under pnv_pec_default_phb_realize() ppc/pnv: rename pnv_pec_stk_update_map() ppc/pnv: remove PnvPhb4PecStack object ppc/pnv: make PECs create and realize PHB4s ppc/pnv: remove PnvPhb4PecStack::stack_no ppc/pnv: move default_phb_realize() to pec_realize() ppc/pnv: remove stack pointer from PnvPHB4 ppc/pnv: reduce stack->stack_no usage ppc/pnv: introduce PnvPHB4 'pec' property ppc/pnv: move phb_regs_mr to PnvPHB4 ppc/pnv: move nest_regs_mr to PnvPHB4 ppc/pnv: change pnv_pec_stk_update_map() to use PnvPHB4 ppc/pnv: move nest_regs[] to PnvPHB4 ppc/pnv: move mmbar0/mmbar1 and friends to PnvPHB4 ppc/pnv: change pnv_phb4_update_regions() to use PnvPHB4 ppc/pnv: move intbar to PnvPHB4 ppc/pnv: move phbbar to PnvPHB4 ppc/pnv: move PCI registers to PnvPHB4 ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Diffstat (limited to 'docs')
-rw-r--r--docs/specs/ppc-spapr-hotplug.rst510
-rw-r--r--docs/specs/ppc-spapr-hotplug.txt409
-rw-r--r--docs/specs/ppc-spapr-uv-hcalls.rst89
-rw-r--r--docs/specs/ppc-spapr-uv-hcalls.txt76
-rw-r--r--docs/system/ppc/pseries.rst8
5 files changed, 601 insertions, 491 deletions
diff --git a/docs/specs/ppc-spapr-hotplug.rst b/docs/specs/ppc-spapr-hotplug.rst
new file mode 100644
index 0000000000..f84dc55ad9
--- /dev/null
+++ b/docs/specs/ppc-spapr-hotplug.rst
@@ -0,0 +1,510 @@
+=============================
+sPAPR Dynamic Reconfiguration
+=============================
+
+sPAPR or pSeries guests make use of a facility called dynamic reconfiguration
+to handle hot plugging of dynamic "physical" resources like PCI cards, or
+"logical"/para-virtual resources like memory, CPUs, and "physical"
+host-bridges, which are generally managed by the host/hypervisor and provided
+to guests as virtualized resources. The specifics of dynamic reconfiguration
+are documented extensively in section 13 of the Linux on Power Architecture
+Reference document ([LoPAR]_). This document provides a summary of that
+information as it applies to the implementation within QEMU.
+
+Dynamic-reconfiguration Connectors
+==================================
+
+To manage hot plug/unplug of these resources, a firmware abstraction known as
+a Dynamic Resource Connector (DRC) is used to assign a particular dynamic
+resource to the guest, and provide an interface for the guest to manage
+configuration/removal of the resource associated with it.
+
+Device tree description of DRCs
+===============================
+
+A set of four Open Firmware device tree array properties are used to describe
+the name/index/power-domain/type of each DRC allocated to a guest at
+boot time. There may be multiple sets of these arrays, rooted at different
+paths in the device tree depending on the type of resource the DRCs manage.
+
+In some cases, the DRCs themselves may be provided by a dynamic resource,
+such as the DRCs managing PCI slots on a hot plugged PHB. In this case the
+arrays would be fetched as part of the device tree retrieval interfaces
+for hot plugged resources described under :ref:`guest-host-interface`.
+
+The array properties are described below. Each entry/element in an array
+describes the DRC identified by the element in the corresponding position
+of ``ibm,drc-indexes``:
+
+``ibm,drc-names``
+-----------------
+
+ First 4-bytes: big-endian (BE) encoded integer denoting the number of entries.
+
+ Each entry: a NULL-terminated ``<name>`` string encoded as a byte array.
+
+ ``<name>`` values for logical/virtual resources are defined in the Linux on
+ Power Architecture Reference ([LoPAR]_) section 13.5.2.4, and basically
+ consist of the type of the resource followed by a space and a numerical
+ value that's unique across resources of that type.
+
+ ``<name>`` values for "physical" resources such as PCI or VIO devices are
+ defined as being "location codes", which are the "location labels" of each
+ encapsulating device, starting from the chassis down to the individual slot
+ for the device, concatenated by a hyphen. This provides a mapping of
+ resources to a physical location in a chassis for debugging purposes. For
+ QEMU, this mapping is less important, so we assign a location code that
+ conforms to naming specifications, but is simply a location label for the
+ slot by itself to simplify the implementation. The naming convention for
+ location labels is documented in detail in the [LoPAR]_ section 12.3.1.5,
+ and in our case amounts to using ``C<n>`` for PCI/VIO device slots, where
+ ``<n>`` is unique across all PCI/VIO device slots.
+
+``ibm,drc-indexes``
+-------------------
+
+ First 4-bytes: BE-encoded integer denoting the number of entries.
+
+ Each 4-byte entry: BE-encoded ``<index>`` integer that is unique across all
+ DRCs in the machine.
+
+ ``<index>`` is arbitrary, but in the case of QEMU we try to maintain the
+ convention used to assign them to pSeries guests on pHyp (the hypervisor
+ portion of PowerVM):
+
+ ``bit[31:28]``: integer encoding of ``<type>``, where ``<type>`` is:
+
+ ``1`` for CPU resource.
+
+ ``2`` for PHB resource.
+
+ ``3`` for VIO resource.
+
+ ``4`` for PCI resource.
+
+ ``8`` for memory resource.
+
+ ``bit[27:0]``: integer encoding of ``<id>``, where ``<id>`` is unique
+ across all resources of specified type.
+
+``ibm,drc-power-domains``
+-------------------------
+
+ First 4-bytes: BE-encoded integer denoting the number of entries.
+
+ Each 4-byte entry: 32-bit, BE-encoded ``<index>`` integer that specifies the
+ power domain the resource will be assigned to. In the case of QEMU we
+ associated all resources with a "live insertion" domain, where the power is
+ assumed to be managed automatically. The integer value for this domain is a
+ special value of ``-1``.
+
+
+``ibm,drc-types``
+-----------------
+
+ First 4-bytes: BE-encoded integer denoting the number of entries.
+
+ Each entry: a NULL-terminated ``<type>`` string encoded as a byte array.
+ ``<type>`` is assigned as follows:
+
+ "CPU" for a CPU.
+
+ "PHB" for a physical host-bridge.
+
+ "SLOT" for a VIO slot.
+
+ "28" for a PCI slot.
+
+ "MEM" for memory resource.
+
+.. _guest-host-interface:
+
+Guest->Host interface to manage dynamic resources
+=================================================
+
+Each DRC is given a globally unique DRC index, and resources associated with a
+particular DRC are configured/managed by the guest via a number of RTAS calls
+which reference individual DRCs based on the DRC index. This can be considered
+the guest->host interface.
+
+``rtas-set-power-level``
+------------------------
+
+Set the power level for a specified power domain.
+
+ ``arg[0]``: integer identifying power domain.
+
+ ``arg[1]``: new power level for the domain, ``0-100``.
+
+ ``output[0]``: status, ``0`` on success.
+
+ ``output[1]``: power level after command.
+
+``rtas-get-power-level``
+------------------------
+
+Get the power level for a specified power domain.
+
+ ``arg[0]``: integer identifying power domain.
+
+ ``output[0]``: status, ``0`` on success.
+
+ ``output[1]``: current power level.
+
+``rtas-set-indicator``
+----------------------
+
+Set the state of an indicator or sensor.
+
+ ``arg[0]``: integer identifying sensor/indicator type.
+
+ ``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC
+ index.
+
+ ``arg[2]``: desired sensor value.
+
+ ``output[0]``: status, ``0`` on success.
+
+For the purpose of this document we focus on the indicator/sensor types
+associated with a DRC. The types are:
+
+* ``9001``: ``isolation-state``, controls/indicates whether a device has been
+ made accessible to a guest. Supported sensor values:
+
+ ``0``: ``isolate``, device is made inaccessible by guest OS.
+
+ ``1``: ``unisolate``, device is made available to guest OS.
+
+* ``9002``: ``dr-indicator``, controls "visual" indicator associated with
+ device. Supported sensor values:
+
+ ``0``: ``inactive``, resource may be safely removed.
+
+ ``1``: ``active``, resource is in use and cannot be safely removed.
+
+ ``2``: ``identify``, used to visually identify slot for interactive hot plug.
+
+ ``3``: ``action``, in most cases, used in the same manner as identify.
+
+* ``9003``: ``allocation-state``, generally only used for "logical" DR resources
+ to request the allocation/deallocation of a resource prior to acquiring it via
+ ``isolation-state->unisolate``, or after releasing it via
+ ``isolation-state->isolate``, respectively. For "physical" DR (like PCI
+ hot plug/unplug) the pre-allocation of the resource is implied and this sensor
+ is unused. Supported sensor values:
+
+ ``0``: ``unusable``, tell firmware/system the resource can be
+ unallocated/reclaimed and added back to the system resource pool.
+
+ ``1``: ``usable``, request the resource be allocated/reserved for use by
+ guest OS.
+
+ ``2``: ``exchange``, used to allocate a spare resource to use for fail-over
+ in certain situations. Unused in QEMU.
+
+ ``3``: ``recover``, used to reclaim a previously allocated resource that's
+ not currently allocated to the guest OS. Unused in QEMU.
+
+``rtas-get-sensor-state:``
+--------------------------
+
+Used to read an indicator or sensor value.
+
+ ``arg[0]``: integer identifying sensor/indicator type.
+
+ ``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC
+ index
+
+ ``output[0]``: status, 0 on success
+
+For DR-related operations, the only noteworthy sensor is ``dr-entity-sense``,
+which has a type value of ``9003``, as ``allocation-state`` does in the case of
+``rtas-set-indicator``. The semantics/encodings of the sensor values are
+distinct however.
+
+Supported sensor values for ``dr-entity-sense`` (``9003``) sensor:
+
+ ``0``: empty.
+
+ For physical resources: DRC/slot is empty.
+
+ For logical resources: unused.
+
+ ``1``: present.
+
+ For physical resources: DRC/slot is populated with a device/resource.
+
+ For logical resources: resource has been allocated to the DRC.
+
+ ``2``: unusable.
+
+ For physical resources: unused.
+
+ For logical resources: DRC has no resource allocated to it.
+
+ ``3``: exchange.
+
+ For physical resources: unused.
+
+ For logical resources: resource available for exchange (see
+ ``allocation-state`` sensor semantics above).
+
+ ``4``: recovery.
+
+ For physical resources: unused.
+
+ For logical resources: resource available for recovery (see
+ ``allocation-state`` sensor semantics above).
+
+``rtas-ibm-configure-connector``
+--------------------------------
+
+Used to fetch an OpenFirmware device tree description of the resource associated
+with a particular DRC.
+
+ ``arg[0]``: guest physical address of 4096-byte work area buffer.
+
+ ``arg[1]``: 0, or address of additional 4096-byte work area buffer; only
+ non-zero if a prior RTAS response indicated a need for additional memory.
+
+ ``output[0]``: status:
+
+ ``0``: completed transmittal of device tree node.
+
+ ``1``: instruct guest to prepare for next device tree sibling node.
+
+ ``2``: instruct guest to prepare for next device tree child node.
+
+ ``3``: instruct guest to prepare for next device tree property.
+
+ ``4``: instruct guest to ascend to parent device tree node.
+
+ ``5``: instruct guest to provide additional work-area buffer via ``arg[1]``.
+
+ ``990x``: instruct guest that operation took too long and to try again
+ later.
+
+The DRC index is encoded in the first 4-bytes of the first work area buffer.
+Work area (``wa``) layout, using 4-byte offsets:
+
+ ``wa[0]``: DRC index of the DRC to fetch device tree nodes from.
+
+ ``wa[1]``: ``0`` (hard-coded).
+
+ ``wa[2]``:
+
+ For next-sibling/next-child response:
+
+ ``wa`` offset of null-terminated string denoting the new node's name.
+
+ For next-property response:
+
+ ``wa`` offset of null-terminated string denoting new property's name.
+
+ ``wa[3]``: for next-property response (unused otherwise):
+
+ Byte-length of new property's value.
+
+ ``wa[4]``: for next-property response (unused otherwise):
+
+ New property's value, encoded as an OFDT-compatible byte array.
+
+Hot plug/unplug events
+======================
+
+For most DR operations, the hypervisor will issue host->guest add/remove events
+using the EPOW/check-exception notification framework, where the host issues a
+check-exception interrupt, then provides an RTAS event log via an
+rtas-check-exception call issued by the guest in response. This framework is
+documented by PAPR+ v2.7, and already use in by QEMU for generating powerdown
+requests via EPOW events.
+
+For DR, this framework has been extended to include hotplug events, which were
+previously unneeded due to direct manipulation of DR-related guest userspace
+tools by host-level management such as an HMC. This level of management is not
+applicable to KVM on Power, hence the reason for extending the notification
+framework to support hotplug events.
+
+The format for these EPOW-signalled events is described below under
+:ref:`hot-plug-unplug-event-structure`. Note that these events are not formally
+part of the PAPR+ specification, and have been superseded by a newer format,
+also described below under :ref:`hot-plug-unplug-event-structure`, and so are
+now deemed a "legacy" format. The formats are similar, but the "modern" format
+contains additional fields/flags, which are denoted for the purposes of this
+documentation with ``#ifdef GUEST_SUPPORTS_MODERN`` guards.
+
+QEMU should assume support only for "legacy" fields/flags unless the guest
+advertises support for the "modern" format via
+``ibm,client-architecture-support`` hcall by setting byte 5, bit 6 of it's
+``ibm,architecture-vec-5`` option vector structure (as described by [LoPAR]_,
+section B.5.2.3). As with "legacy" format events, "modern" format events are
+surfaced to the guest via check-exception RTAS calls, but use a dedicated event
+source to signal the guest. This event source is advertised to the guest by the
+addition of a ``hot-plug-events`` node under ``/event-sources`` node of the
+guest's device tree using the standard format described in [LoPAR]_,
+section B.5.12.2.
+
+.. _hot-plug-unplug-event-structure:
+
+Hot plug/unplug event structure
+===============================
+
+The hot plug specific payload in QEMU is implemented as follows (with all values
+encoded in big-endian format):
+
+.. code-block:: c
+
+ struct rtas_event_log_v6_hp {
+ #define SECTION_ID_HOTPLUG 0x4850 /* HP */
+ struct section_header {
+ uint16_t section_id; /* set to SECTION_ID_HOTPLUG */
+ uint16_t section_length; /* sizeof(rtas_event_log_v6_hp),
+ * plus the length of the DRC name
+ * if a DRC name identifier is
+ * specified for hotplug_identifier
+ */
+ uint8_t section_version; /* version 1 */
+ uint8_t section_subtype; /* unused */
+ uint16_t creator_component_id; /* unused */
+ } hdr;
+ #define RTAS_LOG_V6_HP_TYPE_CPU 1
+ #define RTAS_LOG_V6_HP_TYPE_MEMORY 2
+ #define RTAS_LOG_V6_HP_TYPE_SLOT 3
+ #define RTAS_LOG_V6_HP_TYPE_PHB 4
+ #define RTAS_LOG_V6_HP_TYPE_PCI 5
+ uint8_t hotplug_type; /* type of resource/device */
+ #define RTAS_LOG_V6_HP_ACTION_ADD 1
+ #define RTAS_LOG_V6_HP_ACTION_REMOVE 2
+ uint8_t hotplug_action; /* action (add/remove) */
+ #define RTAS_LOG_V6_HP_ID_DRC_NAME 1
+ #define RTAS_LOG_V6_HP_ID_DRC_INDEX 2
+ #define RTAS_LOG_V6_HP_ID_DRC_COUNT 3
+ #ifdef GUEST_SUPPORTS_MODERN
+ #define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED 4
+ #endif
+ uint8_t hotplug_identifier; /* type of the resource identifier,
+ * which serves as the discriminator
+ * for the 'drc' union field below
+ */
+ #ifdef GUEST_SUPPORTS_MODERN
+ uint8_t capabilities; /* capability flags, currently unused
+ * by QEMU
+ */
+ #else
+ uint8_t reserved;
+ #endif
+ union {
+ uint32_t index; /* DRC index of resource to take action
+ * on
+ */
+ uint32_t count; /* number of DR resources to take
+ * action on (guest chooses which)
+ */
+ #ifdef GUEST_SUPPORTS_MODERN
+ struct {
+ uint32_t count; /* number of DR resources to take
+ * action on
+ */
+ uint32_t index; /* DRC index of first resource to take
+ * action on. guest will take action
+ * on DRC index <index> through
+ * DRC index <index + count - 1> in
+ * sequential order
+ */
+ } count_indexed;
+ #endif
+ char name[1]; /* string representing the name of the
+ * DRC to take action on
+ */
+ } drc;
+ } QEMU_PACKED;
+
+``ibm,lrdr-capacity``
+=====================
+
+``ibm,lrdr-capacity`` is a property in the /rtas device tree node that
+identifies the dynamic reconfiguration capabilities of the guest. It consists
+of a triple consisting of ``<phys>``, ``<size>`` and ``<maxcpus>``.
+
+ ``<phys>``, encoded in BE format represents the maximum address in bytes and
+ hence the maximum memory that can be allocated to the guest.
+
+ ``<size>``, encoded in BE format represents the size increments in which
+ memory can be hot-plugged to the guest.
+
+ ``<maxcpus>``, a BE-encoded integer, represents the maximum number of
+ processors that the guest can have.
+
+``pseries`` guests use this property to note the maximum allowed CPUs for the
+guest.
+
+``ibm,dynamic-reconfiguration-memory``
+======================================
+
+``ibm,dynamic-reconfiguration-memory`` is a device tree node that represents
+dynamically reconfigurable logical memory blocks (LMB). This node is generated
+only when the guest advertises the support for it via
+``ibm,client-architecture-support`` call. Memory that is not dynamically
+reconfigurable is represented by ``/memory`` nodes. The properties of this node
+that are of interest to the sPAPR memory hotplug implementation in QEMU are
+described here.
+
+``ibm,lmb-size``
+----------------
+
+This 64-bit integer defines the size of each dynamically reconfigurable LMB.
+
+``ibm,associativity-lookup-arrays``
+-----------------------------------
+
+This property defines a lookup array in which the NUMA associativity
+information for each LMB can be found. It is a property encoded array
+that begins with an integer M, the number of associativity lists followed
+by an integer N, the number of entries per associativity list and terminated
+by M associativity lists each of length N integers.
+
+This property provides the same information as given by ``ibm,associativity``
+property in a ``/memory`` node. Each assigned LMB has an index value between
+0 and M-1 which is used as an index into this table to select which
+associativity list to use for the LMB. This index value for each LMB is defined
+in ``ibm,dynamic-memory`` property.
+
+``ibm,dynamic-memory``
+----------------------
+
+This property describes the dynamically reconfigurable memory. It is a
+property encoded array that has an integer N, the number of LMBs followed
+by N LMB list entries.
+
+Each LMB list entry consists of the following elements:
+
+- Logical address of the start of the LMB encoded as a 64-bit integer. This
+ corresponds to ``reg`` property in ``/memory`` node.
+- DRC index of the LMB that corresponds to ``ibm,my-drc-index`` property
+ in a ``/memory`` node.
+- Four bytes reserved for expansion.
+- Associativity list index for the LMB that is used as an index into
+ ``ibm,associativity-lookup-arrays`` property described earlier. This is used
+ to retrieve the right associativity list to be used for this LMB.
+- A 32-bit flags word. The bit at bit position ``0x00000008`` defines whether
+ the LMB is assigned to the partition as of boot time.
+
+``ibm,dynamic-memory-v2``
+-------------------------
+
+This property describes the dynamically reconfigurable memory. This is
+an alternate and newer way to describe dynamically reconfigurable memory.
+It is a property encoded array that has an integer N (the number of
+LMB set entries) followed by N LMB set entries. There is an LMB set entry
+for each sequential group of LMBs that share common attributes.
+
+Each LMB set entry consists of the following elements:
+
+- Number of sequential LMBs in the entry represented by a 32-bit integer.
+- Logical address of the first LMB in the set encoded as a 64-bit integer.
+- DRC index of the first LMB in the set.
+- Associativity list index that is used as an index into
+ ``ibm,associativity-lookup-arrays`` property described earlier. This
+ is used to retrieve the right associativity list to be used for all
+ the LMBs in this set.
+- A 32-bit flags word that applies to all the LMBs in the set.
diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt
deleted file mode 100644
index d4fb2d46d9..0000000000
--- a/docs/specs/ppc-spapr-hotplug.txt
+++ /dev/null
@@ -1,409 +0,0 @@
-= sPAPR Dynamic Reconfiguration =
-
-sPAPR/"pseries" guests make use of a facility called dynamic-reconfiguration
-to handle hotplugging of dynamic "physical" resources like PCI cards, or
-"logical"/paravirtual resources like memory, CPUs, and "physical"
-host-bridges, which are generally managed by the host/hypervisor and provided
-to guests as virtualized resources. The specifics of dynamic-reconfiguration
-are documented extensively in PAPR+ v2.7, Section 13.1. This document
-provides a summary of that information as it applies to the implementation
-within QEMU.
-
-== Dynamic-reconfiguration Connectors ==
-
-To manage hotplug/unplug of these resources, a firmware abstraction known as
-a Dynamic Resource Connector (DRC) is used to assign a particular dynamic
-resource to the guest, and provide an interface for the guest to manage
-configuration/removal of the resource associated with it.
-
-== Device-tree description of DRCs ==
-
-A set of 4 Open Firmware device tree array properties are used to describe
-the name/index/power-domain/type of each DRC allocated to a guest at
-boot-time. There may be multiple sets of these arrays, rooted at different
-paths in the device tree depending on the type of resource the DRCs manage.
-
-In some cases, the DRCs themselves may be provided by a dynamic resource,
-such as the DRCs managing PCI slots on a hotplugged PHB. In this case the
-arrays would be fetched as part of the device tree retrieval interfaces
-for hotplugged resources described under "Guest->Host interface".
-
-The array properties are described below. Each entry/element in an array
-describes the DRC identified by the element in the corresponding position
-of ibm,drc-indexes:
-
-ibm,drc-names:
- first 4-bytes: BE-encoded integer denoting the number of entries
- each entry: a NULL-terminated <name> string encoded as a byte array
-
- <name> values for logical/virtual resources are defined in PAPR+ v2.7,
- Section 13.5.2.4, and basically consist of the type of the resource
- followed by a space and a numerical value that's unique across resources
- of that type.
-
- <name> values for "physical" resources such as PCI or VIO devices are
- defined as being "location codes", which are the "location labels" of
- each encapsulating device, starting from the chassis down to the
- individual slot for the device, concatenated by a hyphen. This provides
- a mapping of resources to a physical location in a chassis for debugging
- purposes. For QEMU, this mapping is less important, so we assign a
- location code that conforms to naming specifications, but is simply a
- location label for the slot by itself to simplify the implementation.
- The naming convention for location labels is documented in detail in
- PAPR+ v2.7, Section 12.3.1.5, and in our case amounts to using "C<n>"
- for PCI/VIO device slots, where <n> is unique across all PCI/VIO
- device slots.
-
-ibm,drc-indexes:
- first 4-bytes: BE-encoded integer denoting the number of entries
- each 4-byte entry: BE-encoded <index> integer that is unique across all DRCs
- in the machine
-
- <index> is arbitrary, but in the case of QEMU we try to maintain the
- convention used to assign them to pSeries guests on pHyp:
-
- bit[31:28]: integer encoding of <type>, where <type> is:
- 1 for CPU resource
- 2 for PHB resource
- 3 for VIO resource
- 4 for PCI resource
- 8 for Memory resource
- bit[27:0]: integer encoding of <id>, where <id> is unique across
- all resources of specified type
-
-ibm,drc-power-domains:
- first 4-bytes: BE-encoded integer denoting the number of entries
- each 4-byte entry: 32-bit, BE-encoded <index> integer that specifies the
- power domain the resource will be assigned to. In the case of QEMU
- we associated all resources with a "live insertion" domain, where the
- power is assumed to be managed automatically. The integer value for
- this domain is a special value of -1.
-
-
-ibm,drc-types:
- first 4-bytes: BE-encoded integer denoting the number of entries
- each entry: a NULL-terminated <type> string encoded as a byte array
-
- <type> is assigned as follows:
- "CPU" for a CPU
- "PHB" for a physical host-bridge
- "SLOT" for a VIO slot
- "28" for a PCI slot
- "MEM" for memory resource
-
-== Guest->Host interface to manage dynamic resources ==
-
-Each DRC is given a globally unique DRC Index, and resources associated with
-a particular DRC are configured/managed by the guest via a number of RTAS
-calls which reference individual DRCs based on the DRC index. This can be
-considered the guest->host interface.
-
-rtas-set-power-level:
- arg[0]: integer identifying power domain
- arg[1]: new power level for the domain, 0-100
- output[0]: status, 0 on success
- output[1]: power level after command
-
- Set the power level for a specified power domain
-
-rtas-get-power-level:
- arg[0]: integer identifying power domain
- output[0]: status, 0 on success
- output[1]: current power level
-
- Get the power level for a specified power domain
-
-rtas-set-indicator:
- arg[0]: integer identifying sensor/indicator type
- arg[1]: index of sensor, for DR-related sensors this is generally the
- DRC index
- arg[2]: desired sensor value
- output[0]: status, 0 on success
-
- Set the state of an indicator or sensor. For the purpose of this document we
- focus on the indicator/sensor types associated with a DRC. The types are:
-
- 9001: isolation-state, controls/indicates whether a device has been made
- accessible to a guest
-
- supported sensor values:
- 0: isolate, device is made unaccessible by guest OS
- 1: unisolate, device is made available to guest OS
-
- 9002: dr-indicator, controls "visual" indicator associated with device
-
- supported sensor values:
- 0: inactive, resource may be safely removed
- 1: active, resource is in use and cannot be safely removed
- 2: identify, used to visually identify slot for interactive hotplug
- 3: action, in most cases, used in the same manner as identify
-
- 9003: allocation-state, generally only used for "logical" DR resources to
- request the allocation/deallocation of a resource prior to acquiring
- it via isolation-state->unisolate, or after releasing it via
- isolation-state->isolate, respectively. for "physical" DR (like PCI
- hotplug/unplug) the pre-allocation of the resource is implied and
- this sensor is unused.
-
- supported sensor values:
- 0: unusable, tell firmware/system the resource can be
- unallocated/reclaimed and added back to the system resource pool
- 1: usable, request the resource be allocated/reserved for use by
- guest OS
- 2: exchange, used to allocate a spare resource to use for fail-over
- in certain situations. unused in QEMU
- 3: recover, used to reclaim a previously allocated resource that's
- not currently allocated to the guest OS. unused in QEMU
-
-rtas-get-sensor-state:
- arg[0]: integer identifying sensor/indicator type
- arg[1]: index of sensor, for DR-related sensors this is generally the
- DRC index
- output[0]: status, 0 on success
-
- Used to read an indicator or sensor value.
-
- For DR-related operations, the only noteworthy sensor is dr-entity-sense,
- which has a type value of 9003, as allocation-state does in the case of
- rtas-set-indicator. The semantics/encodings of the sensor values are distinct
- however:
-
- supported sensor values for dr-entity-sense (9003) sensor:
- 0: empty,
- for physical resources: DRC/slot is empty
- for logical resources: unused
- 1: present,
- for physical resources: DRC/slot is populated with a device/resource
- for logical resources: resource has been allocated to the DRC
- 2: unusable,
- for physical resources: unused
- for logical resources: DRC has no resource allocated to it
- 3: exchange,
- for physical resources: unused
- for logical resources: resource available for exchange (see
- allocation-state sensor semantics above)
- 4: recovery,
- for physical resources: unused
- for logical resources: resource available for recovery (see
- allocation-state sensor semantics above)
-
-rtas-ibm-configure-connector:
- arg[0]: guest physical address of 4096-byte work area buffer
- arg[1]: 0, or address of additional 4096-byte work area buffer. only non-zero
- if a prior RTAS response indicated a need for additional memory
- output[0]: status:
- 0: completed transmittal of device-tree node
- 1: instruct guest to prepare for next DT sibling node
- 2: instruct guest to prepare for next DT child node
- 3: instruct guest to prepare for next DT property
- 4: instruct guest to ascend to parent DT node
- 5: instruct guest to provide additional work-area buffer
- via arg[1]
- 990x: instruct guest that operation took too long and to try
- again later
-
- Used to fetch an OF device-tree description of the resource associated with
- a particular DRC. The DRC index is encoded in the first 4-bytes of the first
- work area buffer.
-
- Work area layout, using 4-byte offsets:
- wa[0]: DRC index of the DRC to fetch device-tree nodes from
- wa[1]: 0 (hard-coded)
- wa[2]: for next-sibling/next-child response:
- wa offset of null-terminated string denoting the new node's name
- for next-property response:
- wa offset of null-terminated string denoting new property's name
- wa[3]: for next-property response (unused otherwise):
- byte-length of new property's value
- wa[4]: for next-property response (unused otherwise):
- new property's value, encoded as an OFDT-compatible byte array
-
-== hotplug/unplug events ==
-
-For most DR operations, the hypervisor will issue host->guest add/remove events
-using the EPOW/check-exception notification framework, where the host issues a
-check-exception interrupt, then provides an RTAS event log via an
-rtas-check-exception call issued by the guest in response. This framework is
-documented by PAPR+ v2.7, and already use in by QEMU for generating powerdown
-requests via EPOW events.
-
-For DR, this framework has been extended to include hotplug events, which were
-previously unneeded due to direct manipulation of DR-related guest userspace
-tools by host-level management such as an HMC. This level of management is not
-applicable to PowerKVM, hence the reason for extending the notification
-framework to support hotplug events.
-
-The format for these EPOW-signalled events is described below under
-"hotplug/unplug event structure". Note that these events are not
-formally part of the PAPR+ specification, and have been superseded by a
-newer format, also described below under "hotplug/unplug event structure",
-and so are now deemed a "legacy" format. The formats are similar, but the
-"modern" format contains additional fields/flags, which are denoted for the
-purposes of this documentation with "#ifdef GUEST_SUPPORTS_MODERN" guards.
-
-QEMU should assume support only for "legacy" fields/flags unless the guest
-advertises support for the "modern" format via ibm,client-architecture-support
-hcall by setting byte 5, bit 6 of it's ibm,architecture-vec-5 option vector
-structure (as described by LoPAPR v11, B.6.2.3). As with "legacy" format events,
-"modern" format events are surfaced to the guest via check-exception RTAS calls,
-but use a dedicated event source to signal the guest. This event source is
-advertised to the guest by the addition of a "hot-plug-events" node under
-"/event-sources" node of the guest's device tree using the standard format
-described in LoPAPR v11, B.6.12.1.
-
-== hotplug/unplug event structure ==
-
-The hotplug-specific payload in QEMU is implemented as follows (with all values
-encoded in big-endian format):
-
-struct rtas_event_log_v6_hp {
-#define SECTION_ID_HOTPLUG 0x4850 /* HP */
- struct section_header {
- uint16_t section_id; /* set to SECTION_ID_HOTPLUG */
- uint16_t section_length; /* sizeof(rtas_event_log_v6_hp),
- * plus the length of the DRC name
- * if a DRC name identifier is
- * specified for hotplug_identifier
- */
- uint8_t section_version; /* version 1 */
- uint8_t section_subtype; /* unused */
- uint16_t creator_component_id; /* unused */
- } hdr;
-#define RTAS_LOG_V6_HP_TYPE_CPU 1
-#define RTAS_LOG_V6_HP_TYPE_MEMORY 2
-#define RTAS_LOG_V6_HP_TYPE_SLOT 3
-#define RTAS_LOG_V6_HP_TYPE_PHB 4
-#define RTAS_LOG_V6_HP_TYPE_PCI 5
- uint8_t hotplug_type; /* type of resource/device */
-#define RTAS_LOG_V6_HP_ACTION_ADD 1
-#define RTAS_LOG_V6_HP_ACTION_REMOVE 2
- uint8_t hotplug_action; /* action (add/remove) */
-#define RTAS_LOG_V6_HP_ID_DRC_NAME 1
-#define RTAS_LOG_V6_HP_ID_DRC_INDEX 2
-#define RTAS_LOG_V6_HP_ID_DRC_COUNT 3
-#ifdef GUEST_SUPPORTS_MODERN
-#define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED 4
-#endif
- uint8_t hotplug_identifier; /* type of the resource identifier,
- * which serves as the discriminator
- * for the 'drc' union field below
- */
-#ifdef GUEST_SUPPORTS_MODERN
- uint8_t capabilities; /* capability flags, currently unused
- * by QEMU
- */
-#else
- uint8_t reserved;
-#endif
- union {
- uint32_t index; /* DRC index of resource to take action
- * on
- */
- uint32_t count; /* number of DR resources to take
- * action on (guest chooses which)
- */
-#ifdef GUEST_SUPPORTS_MODERN
- struct {
- uint32_t count; /* number of DR resources to take
- * action on
- */
- uint32_t index; /* DRC index of first resource to take
- * action on. guest will take action
- * on DRC index <index> through
- * DRC index <index + count - 1> in
- * sequential order
- */
- } count_indexed;
-#endif
- char name[1]; /* string representing the name of the
- * DRC to take action on
- */
- } drc;
-} QEMU_PACKED;
-
-== ibm,lrdr-capacity ==
-
-ibm,lrdr-capacity is a property in the /rtas device tree node that identifies
-the dynamic reconfiguration capabilities of the guest. It consists of a triple
-consisting of <phys>, <size> and <maxcpus>.
-
- <phys>, encoded in BE format represents the maximum address in bytes and
- hence the maximum memory that can be allocated to the guest.
-
- <size>, encoded in BE format represents the size increments in which
- memory can be hot-plugged to the guest.
-
- <maxcpus>, a BE-encoded integer, represents the maximum number of
- processors that the guest can have.
-
-pseries guests use this property to note the maximum allowed CPUs for the
-guest.
-
-== ibm,dynamic-reconfiguration-memory ==
-
-ibm,dynamic-reconfiguration-memory is a device tree node that represents
-dynamically reconfigurable logical memory blocks (LMB). This node
-is generated only when the guest advertises the support for it via
-ibm,client-architecture-support call. Memory that is not dynamically
-reconfigurable is represented by /memory nodes. The properties of this
-node that are of interest to the sPAPR memory hotplug implementation
-in QEMU are described here.
-
-ibm,lmb-size
-
-This 64bit integer defines the size of each dynamically reconfigurable LMB.
-
-ibm,associativity-lookup-arrays
-
-This property defines a lookup array in which the NUMA associativity
-information for each LMB can be found. It is a property encoded array
-that begins with an integer M, the number of associativity lists followed
-by an integer N, the number of entries per associativity list and terminated
-by M associativity lists each of length N integers.
-
-This property provides the same information as given by ibm,associativity
-property in a /memory node. Each assigned LMB has an index value between
-0 and M-1 which is used as an index into this table to select which
-associativity list to use for the LMB. This index value for each LMB
-is defined in ibm,dynamic-memory property.
-
-ibm,dynamic-memory
-
-This property describes the dynamically reconfigurable memory. It is a
-property encoded array that has an integer N, the number of LMBs followed
-by N LMB list entries.
-
-Each LMB list entry consists of the following elements:
-
-- Logical address of the start of the LMB encoded as a 64bit integer. This
- corresponds to reg property in /memory node.
-- DRC index of the LMB that corresponds to ibm,my-drc-index property
- in a /memory node.
-- Four bytes reserved for expansion.
-- Associativity list index for the LMB that is used as an index into
- ibm,associativity-lookup-arrays property described earlier. This
- is used to retrieve the right associativity list to be used for this
- LMB.
-- A 32bit flags word. The bit at bit position 0x00000008 defines whether
- the LMB is assigned to the partition as of boot time.
-
-ibm,dynamic-memory-v2
-
-This property describes the dynamically reconfigurable memory. This is
-an alternate and newer way to describe dynamically reconfigurable memory.
-It is a property encoded array that has an integer N (the number of
-LMB set entries) followed by N LMB set entries. There is an LMB set entry
-for each sequential group of LMBs that share common attributes.
-
-Each LMB set entry consists of the following elements:
-
-- Number of sequential LMBs in the entry represented by a 32bit integer.
-- Logical address of the first LMB in the set encoded as a 64bit integer.
-- DRC index of the first LMB in the set.
-- Associativity list index that is used as an index into
- ibm,associativity-lookup-arrays property described earlier. This
- is used to retrieve the right associativity list to be used for all
- the LMBs in this set.
-- A 32bit flags word that applies to all the LMBs in the set.
-
-[1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867
diff --git a/docs/specs/ppc-spapr-uv-hcalls.rst b/docs/specs/ppc-spapr-uv-hcalls.rst
new file mode 100644
index 0000000000..a00288deb3
--- /dev/null
+++ b/docs/specs/ppc-spapr-uv-hcalls.rst
@@ -0,0 +1,89 @@
+===================================
+Hypervisor calls and the Ultravisor
+===================================
+
+On PPC64 systems supporting Protected Execution Facility (PEF), system memory
+can be placed in a secured region where only an ultravisor running in firmware
+can provide access to. pSeries guests on such systems can communicate with
+the ultravisor (via ultracalls) to switch to a secure virtual machine (SVM) mode
+where the guest's memory is relocated to this secured region, making its memory
+inaccessible to normal processes/guests running on the host.
+
+The various ultracalls/hypercalls relating to SVM mode are currently only
+documented internally, but are planned for direct inclusion into the Linux on
+Power Architecture Reference document ([LoPAR]_). An internal ACR has been filed
+to reserve a hypercall number range specific to this use case to avoid any
+future conflicts with the IBM internally maintained Power Architecture Platform
+Reference (PAPR+) documentation specification. This document summarizes some of
+these details as they relate to QEMU.
+
+Hypercalls needed by the ultravisor
+===================================
+
+Switching to SVM mode involves a number of hcalls issued by the ultravisor to
+the hypervisor to orchestrate the movement of guest memory to secure memory and
+various other aspects of the SVM mode. Numbers are assigned for these hcalls
+within the reserved range ``0xEF00-0xEF80``. The below documents the hcalls
+relevant to QEMU.
+
+``H_TPM_COMM`` (``0xef10``)
+---------------------------
+
+SVM file systems are encrypted using a symmetric key. This key is then
+wrapped/encrypted using the public key of a trusted system which has the private
+key stored in the system's TPM. An Ultravisor will use this hcall to
+unwrap/unseal the symmetric key using the system's TPM device or a TPM Resource
+Manager associated with the device.
+
+The Ultravisor sets up a separate session key with the TPM in advance during
+host system boot. All sensitive in and out values will be encrypted using the
+session key. Though the hypervisor will see the in and out buffers in raw form,
+any sensitive contents will generally be encrypted using this session key.
+
+Arguments:
+
+ ``r3``: ``H_TPM_COMM`` (``0xef10``)
+
+ ``r4``: ``TPM`` operation, one of:
+
+ ``TPM_COMM_OP_EXECUTE`` (``0x1``): send a request to a TPM and receive a
+ response, opening a new TPM session if one has not already been opened.
+
+ ``TPM_COMM_OP_CLOSE_SESSION`` (``0x2``): close the existing TPM session, if
+ any.
+
+ ``r5``: ``in_buffer``, guest physical address of buffer containing the
+ request. Caller may use the same address for both request and response.
+
+ ``r6``: ``in_size``, size of the in buffer. Must be less than or equal to
+ 4 KB.
+
+ ``r7``: ``out_buffer``, guest physical address of buffer to store the
+ response. Caller may use the same address for both request and response.
+
+ ``r8``: ``out_size``, size of the out buffer. Must be at least 4 KB, as this
+ is the maximum request/response size supported by most TPM implementations,
+ including the TPM Resource Manager in the linux kernel.
+
+Return values:
+
+ ``r3``: one of the following values:
+
+ ``H_Success``: request processed successfully.
+
+ ``H_PARAMETER``: invalid TPM operation.
+
+ ``H_P2``: ``in_buffer`` is invalid.
+
+ ``H_P3``: ``in_size`` is invalid.
+
+ ``H_P4``: ``out_buffer`` is invalid.
+
+ ``H_P5``: ``out_size`` is invalid.
+
+ ``H_RESOURCE``: problem communicating with TPM.
+
+ ``H_FUNCTION``: TPM access is not currently allowed/configured.
+
+ ``r4``: For ``TPM_COMM_OP_EXECUTE``, the size of the response will be stored
+ here upon success.
diff --git a/docs/specs/ppc-spapr-uv-hcalls.txt b/docs/specs/ppc-spapr-uv-hcalls.txt
deleted file mode 100644
index 389c2740d7..0000000000
--- a/docs/specs/ppc-spapr-uv-hcalls.txt
+++ /dev/null
@@ -1,76 +0,0 @@
-On PPC64 systems supporting Protected Execution Facility (PEF), system
-memory can be placed in a secured region where only an "ultravisor"
-running in firmware can provide to access it. pseries guests on such
-systems can communicate with the ultravisor (via ultracalls) to switch to a
-secure VM mode (SVM) where the guest's memory is relocated to this secured
-region, making its memory inaccessible to normal processes/guests running on
-the host.
-
-The various ultracalls/hypercalls relating to SVM mode are currently
-only documented internally, but are planned for direct inclusion into the
-public OpenPOWER version of the PAPR specification (LoPAPR/LoPAR). An internal
-ACR has been filed to reserve a hypercall number range specific to this
-use-case to avoid any future conflicts with the internally-maintained PAPR
-specification. This document summarizes some of these details as they relate
-to QEMU.
-
-== hypercalls needed by the ultravisor ==
-
-Switching to SVM mode involves a number of hcalls issued by the ultravisor
-to the hypervisor to orchestrate the movement of guest memory to secure
-memory and various other aspects SVM mode. Numbers are assigned for these
-hcalls within the reserved range 0xEF00-0xEF80. The below documents the
-hcalls relevant to QEMU.
-
-- H_TPM_COMM (0xef10)
-
- For TPM_COMM_OP_EXECUTE operation:
- Send a request to a TPM and receive a response, opening a new TPM session
- if one has not already been opened.
-
- For TPM_COMM_OP_CLOSE_SESSION operation:
- Close the existing TPM session, if any.
-
- Arguments:
-
- r3 : H_TPM_COMM (0xef10)
- r4 : TPM operation, one of:
- TPM_COMM_OP_EXECUTE (0x1)
- TPM_COMM_OP_CLOSE_SESSION (0x2)
- r5 : in_buffer, guest physical address of buffer containing the request
- - Caller may use the same address for both request and response
- r6 : in_size, size of the in buffer
- - Must be less than or equal to 4KB
- r7 : out_buffer, guest physical address of buffer to store the response
- - Caller may use the same address for both request and response
- r8 : out_size, size of the out buffer
- - Must be at least 4KB, as this is the maximum request/response size
- supported by most TPM implementations, including the TPM Resource
- Manager in the linux kernel.
-
- Return values:
-
- r3 : H_Success request processed successfully
- H_PARAMETER invalid TPM operation
- H_P2 in_buffer is invalid
- H_P3 in_size is invalid
- H_P4 out_buffer is invalid
- H_P5 out_size is invalid
- H_RESOURCE problem communicating with TPM
- H_FUNCTION TPM access is not currently allowed/configured
- r4 : For TPM_COMM_OP_EXECUTE, the size of the response will be stored here
- upon success.
-
- Use-case/notes:
-
- SVM filesystems are encrypted using a symmetric key. This key is then
- wrapped/encrypted using the public key of a trusted system which has the
- private key stored in the system's TPM. An Ultravisor will use this
- hcall to unwrap/unseal the symmetric key using the system's TPM device
- or a TPM Resource Manager associated with the device.
-
- The Ultravisor sets up a separate session key with the TPM in advance
- during host system boot. All sensitive in and out values will be
- encrypted using the session key. Though the hypervisor will see the 'in'
- and 'out' buffers in raw form, any sensitive contents will generally be
- encrypted using this session key.
diff --git a/docs/system/ppc/pseries.rst b/docs/system/ppc/pseries.rst
index 1689324815..569237dc0c 100644
--- a/docs/system/ppc/pseries.rst
+++ b/docs/system/ppc/pseries.rst
@@ -110,16 +110,12 @@ can also be found in QEMU documentation:
.. toctree::
:maxdepth: 1
+ ../../specs/ppc-spapr-hotplug.rst
../../specs/ppc-spapr-hcalls.rst
../../specs/ppc-spapr-numa.rst
+ ../../specs/ppc-spapr-uv-hcalls.rst
../../specs/ppc-spapr-xive.rst
-Other documentation available in QEMU docs directory:
-
-* Hot plug (``/docs/specs/ppc-spapr-hotplug.txt``).
-* Hypervisor calls needed by the Ultravisor
- (``/docs/specs/ppc-spapr-uv-hcalls.txt``).
-
Switching between the KVM-PR and KVM-HV kernel module
=====================================================