From bb1cff6ee044cb13e2e81609a0b9a86378f85f1f Mon Sep 17 00:00:00 2001 From: Peter Maydell Date: Wed, 27 Sep 2023 16:12:00 +0100 Subject: docs/specs/ivshmem-spec: Convert to rST MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Convert docs/specs/ivshmem-spec.txt to rST format. In converting, I have dropped the sections on the device's command line interface and usage, as they are already covered by the user-facing docs in system/devices/ivshmem.rst. I have also removed the reference to Memnic, because the URL is dead and a web search suggests that whatever this was it's pretty much sunk without trace. Signed-off-by: Peter Maydell Message-id: 20230927151205.70930-4-peter.maydell@linaro.org Reviewed-by: Philippe Mathieu-Daudé --- docs/specs/index.rst | 1 + docs/specs/ivshmem-spec.rst | 241 +++++++++++++++++++++++++++++++++++++ docs/specs/ivshmem-spec.txt | 258 ---------------------------------------- docs/specs/pci-ids.rst | 2 +- docs/system/devices/ivshmem.rst | 2 +- 5 files changed, 244 insertions(+), 260 deletions(-) create mode 100644 docs/specs/ivshmem-spec.rst delete mode 100644 docs/specs/ivshmem-spec.txt (limited to 'docs') diff --git a/docs/specs/index.rst b/docs/specs/index.rst index 30a0cf3d47..e60c837754 100644 --- a/docs/specs/index.rst +++ b/docs/specs/index.rst @@ -26,3 +26,4 @@ guest hardware that is specific to QEMU. fw_cfg vmw_pvscsi-spec edu + ivshmem-spec diff --git a/docs/specs/ivshmem-spec.rst b/docs/specs/ivshmem-spec.rst new file mode 100644 index 0000000000..2d8e80055b --- /dev/null +++ b/docs/specs/ivshmem-spec.rst @@ -0,0 +1,241 @@ +====================================================== +Device Specification for Inter-VM shared memory device +====================================================== + +The Inter-VM shared memory device (ivshmem) is designed to share a +memory region between multiple QEMU processes running different guests +and the host. In order for all guests to be able to pick up the +shared memory area, it is modeled by QEMU as a PCI device exposing +said memory to the guest as a PCI BAR. + +The device can use a shared memory object on the host directly, or it +can obtain one from an ivshmem server. + +In the latter case, the device can additionally interrupt its peers, and +get interrupted by its peers. + +For information on configuring the ivshmem device on the QEMU +command line, see :doc:`../system/devices/ivshmem`. + +The ivshmem PCI device's guest interface +======================================== + +The device has vendor ID 1af4, device ID 1110, revision 1. Before +QEMU 2.6.0, it had revision 0. + +PCI BARs +-------- + +The ivshmem PCI device has two or three BARs: + +- BAR0 holds device registers (256 Byte MMIO) +- BAR1 holds MSI-X table and PBA (only ivshmem-doorbell) +- BAR2 maps the shared memory object + +There are two ways to use this device: + +- If you only need the shared memory part, BAR2 suffices. This way, + you have access to the shared memory in the guest and can use it as + you see fit. + +- If you additionally need the capability for peers to interrupt each + other, you need BAR0 and BAR1. You will most likely want to write a + kernel driver to handle interrupts. Requires the device to be + configured for interrupts, obviously. + +Before QEMU 2.6.0, BAR2 can initially be invalid if the device is +configured for interrupts. It becomes safely accessible only after +the ivshmem server provided the shared memory. These devices have PCI +revision 0 rather than 1. Guest software should wait for the +IVPosition register (described below) to become non-negative before +accessing BAR2. + +Revision 0 of the device is not capable to tell guest software whether +it is configured for interrupts. + +PCI device registers +-------------------- + +BAR 0 contains the following registers: + +:: + + Offset Size Access On reset Function + 0 4 read/write 0 Interrupt Mask + bit 0: peer interrupt (rev 0) + reserved (rev 1) + bit 1..31: reserved + 4 4 read/write 0 Interrupt Status + bit 0: peer interrupt (rev 0) + reserved (rev 1) + bit 1..31: reserved + 8 4 read-only 0 or ID IVPosition + 12 4 write-only N/A Doorbell + bit 0..15: vector + bit 16..31: peer ID + 16 240 none N/A reserved + +Software should only access the registers as specified in column +"Access". Reserved bits should be ignored on read, and preserved on +write. + +In revision 0 of the device, Interrupt Status and Mask Register +together control the legacy INTx interrupt when the device has no +MSI-X capability: INTx is asserted when the bit-wise AND of Status and +Mask is non-zero and the device has no MSI-X capability. Interrupt +Status Register bit 0 becomes 1 when an interrupt request from a peer +is received. Reading the register clears it. + +IVPosition Register: if the device is not configured for interrupts, +this is zero. Else, it is the device's ID (between 0 and 65535). + +Before QEMU 2.6.0, the register may read -1 for a short while after +reset. These devices have PCI revision 0 rather than 1. + +There is no good way for software to find out whether the device is +configured for interrupts. A positive IVPosition means interrupts, +but zero could be either. + +Doorbell Register: writing this register requests to interrupt a peer. +The written value's high 16 bits are the ID of the peer to interrupt, +and its low 16 bits select an interrupt vector. + +If the device is not configured for interrupts, the write is ignored. + +If the interrupt hasn't completed setup, the write is ignored. The +device is not capable to tell guest software whether setup is +complete. Interrupts can regress to this state on migration. + +If the peer with the requested ID isn't connected, or it has fewer +interrupt vectors connected, the write is ignored. The device is not +capable to tell guest software what peers are connected, or how many +interrupt vectors are connected. + +The peer's interrupt for this vector then becomes pending. There is +no way for software to clear the pending bit, and a polling mode of +operation is therefore impossible. + +If the peer is a revision 0 device without MSI-X capability, its +Interrupt Status register is set to 1. This asserts INTx unless +masked by the Interrupt Mask register. The device is not capable to +communicate the interrupt vector to guest software then. + +With multiple MSI-X vectors, different vectors can be used to indicate +different events have occurred. The semantics of interrupt vectors +are left to the application. + +Interrupt infrastructure +======================== + +When configured for interrupts, the peers share eventfd objects in +addition to shared memory. The shared resources are managed by an +ivshmem server. + +The ivshmem server +------------------ + +The server listens on a UNIX domain socket. + +For each new client that connects to the server, the server + +- picks an ID, +- creates eventfd file descriptors for the interrupt vectors, +- sends the ID and the file descriptor for the shared memory to the + new client, +- sends connect notifications for the new client to the other clients + (these contain file descriptors for sending interrupts), +- sends connect notifications for the other clients to the new client, + and +- sends interrupt setup messages to the new client (these contain file + descriptors for receiving interrupts). + +The first client to connect to the server receives ID zero. + +When a client disconnects from the server, the server sends disconnect +notifications to the other clients. + +The next section describes the protocol in detail. + +If the server terminates without sending disconnect notifications for +its connected clients, the clients can elect to continue. They can +communicate with each other normally, but won't receive disconnect +notification on disconnect, and no new clients can connect. There is +no way for the clients to connect to a restarted server. The device +is not capable to tell guest software whether the server is still up. + +Example server code is in contrib/ivshmem-server/. Not to be used in +production. It assumes all clients use the same number of interrupt +vectors. + +A standalone client is in contrib/ivshmem-client/. It can be useful +for debugging. + +The ivshmem Client-Server Protocol +---------------------------------- + +An ivshmem device configured for interrupts connects to an ivshmem +server. This section details the protocol between the two. + +The connection is one-way: the server sends messages to the client. +Each message consists of a single 8 byte little-endian signed number, +and may be accompanied by a file descriptor via SCM_RIGHTS. Both +client and server close the connection on error. + +Note: QEMU currently doesn't close the connection right on error, but +only when the character device is destroyed. + +On connect, the server sends the following messages in order: + +1. The protocol version number, currently zero. The client should + close the connection on receipt of versions it can't handle. + +2. The client's ID. This is unique among all clients of this server. + IDs must be between 0 and 65535, because the Doorbell register + provides only 16 bits for them. + +3. The number -1, accompanied by the file descriptor for the shared + memory. + +4. Connect notifications for existing other clients, if any. This is + a peer ID (number between 0 and 65535 other than the client's ID), + repeated N times. Each repetition is accompanied by one file + descriptor. These are for interrupting the peer with that ID using + vector 0,..,N-1, in order. If the client is configured for fewer + vectors, it closes the extra file descriptors. If it is configured + for more, the extra vectors remain unconnected. + +5. Interrupt setup. This is the client's own ID, repeated N times. + Each repetition is accompanied by one file descriptor. These are + for receiving interrupts from peers using vector 0,..,N-1, in + order. If the client is configured for fewer vectors, it closes + the extra file descriptors. If it is configured for more, the + extra vectors remain unconnected. + +From then on, the server sends these kinds of messages: + +6. Connection / disconnection notification. This is a peer ID. + + - If the number comes with a file descriptor, it's a connection + notification, exactly like in step 4. + + - Else, it's a disconnection notification for the peer with that ID. + +Known bugs: + +* The protocol changed incompatibly in QEMU 2.5. Before, messages + were native endian long, and there was no version number. + +* The protocol is poorly designed. + +The ivshmem Client-Client Protocol +---------------------------------- + +An ivshmem device configured for interrupts receives eventfd file +descriptors for interrupting peers and getting interrupted by peers +from the server, as explained in the previous section. + +To interrupt a peer, the device writes the 8-byte integer 1 in native +byte order to the respective file descriptor. + +To receive an interrupt, the device reads and discards as many 8-byte +integers as it can. diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt deleted file mode 100644 index 1beb3a01ec..0000000000 --- a/docs/specs/ivshmem-spec.txt +++ /dev/null @@ -1,258 +0,0 @@ -= Device Specification for Inter-VM shared memory device = - -The Inter-VM shared memory device (ivshmem) is designed to share a -memory region between multiple QEMU processes running different guests -and the host. In order for all guests to be able to pick up the -shared memory area, it is modeled by QEMU as a PCI device exposing -said memory to the guest as a PCI BAR. - -The device can use a shared memory object on the host directly, or it -can obtain one from an ivshmem server. - -In the latter case, the device can additionally interrupt its peers, and -get interrupted by its peers. - - -== Configuring the ivshmem PCI device == - -There are two basic configurations: - -- Just shared memory: - - -device ivshmem-plain,memdev=HMB,... - - This uses host memory backend HMB. It should have option "share" - set. - -- Shared memory plus interrupts: - - -device ivshmem-doorbell,chardev=CHR,vectors=N,... - - An ivshmem server must already be running on the host. The device - connects to the server's UNIX domain socket via character device - CHR. - - Each peer gets assigned a unique ID by the server. IDs must be - between 0 and 65535. - - Interrupts are message-signaled (MSI-X). vectors=N configures the - number of vectors to use. - -For more details on ivshmem device properties, see the QEMU Emulator -user documentation. - - -== The ivshmem PCI device's guest interface == - -The device has vendor ID 1af4, device ID 1110, revision 1. Before -QEMU 2.6.0, it had revision 0. - -=== PCI BARs === - -The ivshmem PCI device has two or three BARs: - -- BAR0 holds device registers (256 Byte MMIO) -- BAR1 holds MSI-X table and PBA (only ivshmem-doorbell) -- BAR2 maps the shared memory object - -There are two ways to use this device: - -- If you only need the shared memory part, BAR2 suffices. This way, - you have access to the shared memory in the guest and can use it as - you see fit. Memnic, for example, uses ivshmem this way from guest - user space (see http://dpdk.org/browse/memnic). - -- If you additionally need the capability for peers to interrupt each - other, you need BAR0 and BAR1. You will most likely want to write a - kernel driver to handle interrupts. Requires the device to be - configured for interrupts, obviously. - -Before QEMU 2.6.0, BAR2 can initially be invalid if the device is -configured for interrupts. It becomes safely accessible only after -the ivshmem server provided the shared memory. These devices have PCI -revision 0 rather than 1. Guest software should wait for the -IVPosition register (described below) to become non-negative before -accessing BAR2. - -Revision 0 of the device is not capable to tell guest software whether -it is configured for interrupts. - -=== PCI device registers === - -BAR 0 contains the following registers: - - Offset Size Access On reset Function - 0 4 read/write 0 Interrupt Mask - bit 0: peer interrupt (rev 0) - reserved (rev 1) - bit 1..31: reserved - 4 4 read/write 0 Interrupt Status - bit 0: peer interrupt (rev 0) - reserved (rev 1) - bit 1..31: reserved - 8 4 read-only 0 or ID IVPosition - 12 4 write-only N/A Doorbell - bit 0..15: vector - bit 16..31: peer ID - 16 240 none N/A reserved - -Software should only access the registers as specified in column -"Access". Reserved bits should be ignored on read, and preserved on -write. - -In revision 0 of the device, Interrupt Status and Mask Register -together control the legacy INTx interrupt when the device has no -MSI-X capability: INTx is asserted when the bit-wise AND of Status and -Mask is non-zero and the device has no MSI-X capability. Interrupt -Status Register bit 0 becomes 1 when an interrupt request from a peer -is received. Reading the register clears it. - -IVPosition Register: if the device is not configured for interrupts, -this is zero. Else, it is the device's ID (between 0 and 65535). - -Before QEMU 2.6.0, the register may read -1 for a short while after -reset. These devices have PCI revision 0 rather than 1. - -There is no good way for software to find out whether the device is -configured for interrupts. A positive IVPosition means interrupts, -but zero could be either. - -Doorbell Register: writing this register requests to interrupt a peer. -The written value's high 16 bits are the ID of the peer to interrupt, -and its low 16 bits select an interrupt vector. - -If the device is not configured for interrupts, the write is ignored. - -If the interrupt hasn't completed setup, the write is ignored. The -device is not capable to tell guest software whether setup is -complete. Interrupts can regress to this state on migration. - -If the peer with the requested ID isn't connected, or it has fewer -interrupt vectors connected, the write is ignored. The device is not -capable to tell guest software what peers are connected, or how many -interrupt vectors are connected. - -The peer's interrupt for this vector then becomes pending. There is -no way for software to clear the pending bit, and a polling mode of -operation is therefore impossible. - -If the peer is a revision 0 device without MSI-X capability, its -Interrupt Status register is set to 1. This asserts INTx unless -masked by the Interrupt Mask register. The device is not capable to -communicate the interrupt vector to guest software then. - -With multiple MSI-X vectors, different vectors can be used to indicate -different events have occurred. The semantics of interrupt vectors -are left to the application. - - -== Interrupt infrastructure == - -When configured for interrupts, the peers share eventfd objects in -addition to shared memory. The shared resources are managed by an -ivshmem server. - -=== The ivshmem server === - -The server listens on a UNIX domain socket. - -For each new client that connects to the server, the server -- picks an ID, -- creates eventfd file descriptors for the interrupt vectors, -- sends the ID and the file descriptor for the shared memory to the - new client, -- sends connect notifications for the new client to the other clients - (these contain file descriptors for sending interrupts), -- sends connect notifications for the other clients to the new client, - and -- sends interrupt setup messages to the new client (these contain file - descriptors for receiving interrupts). - -The first client to connect to the server receives ID zero. - -When a client disconnects from the server, the server sends disconnect -notifications to the other clients. - -The next section describes the protocol in detail. - -If the server terminates without sending disconnect notifications for -its connected clients, the clients can elect to continue. They can -communicate with each other normally, but won't receive disconnect -notification on disconnect, and no new clients can connect. There is -no way for the clients to connect to a restarted server. The device -is not capable to tell guest software whether the server is still up. - -Example server code is in contrib/ivshmem-server/. Not to be used in -production. It assumes all clients use the same number of interrupt -vectors. - -A standalone client is in contrib/ivshmem-client/. It can be useful -for debugging. - -=== The ivshmem Client-Server Protocol === - -An ivshmem device configured for interrupts connects to an ivshmem -server. This section details the protocol between the two. - -The connection is one-way: the server sends messages to the client. -Each message consists of a single 8 byte little-endian signed number, -and may be accompanied by a file descriptor via SCM_RIGHTS. Both -client and server close the connection on error. - -Note: QEMU currently doesn't close the connection right on error, but -only when the character device is destroyed. - -On connect, the server sends the following messages in order: - -1. The protocol version number, currently zero. The client should - close the connection on receipt of versions it can't handle. - -2. The client's ID. This is unique among all clients of this server. - IDs must be between 0 and 65535, because the Doorbell register - provides only 16 bits for them. - -3. The number -1, accompanied by the file descriptor for the shared - memory. - -4. Connect notifications for existing other clients, if any. This is - a peer ID (number between 0 and 65535 other than the client's ID), - repeated N times. Each repetition is accompanied by one file - descriptor. These are for interrupting the peer with that ID using - vector 0,..,N-1, in order. If the client is configured for fewer - vectors, it closes the extra file descriptors. If it is configured - for more, the extra vectors remain unconnected. - -5. Interrupt setup. This is the client's own ID, repeated N times. - Each repetition is accompanied by one file descriptor. These are - for receiving interrupts from peers using vector 0,..,N-1, in - order. If the client is configured for fewer vectors, it closes - the extra file descriptors. If it is configured for more, the - extra vectors remain unconnected. - -From then on, the server sends these kinds of messages: - -6. Connection / disconnection notification. This is a peer ID. - - - If the number comes with a file descriptor, it's a connection - notification, exactly like in step 4. - - - Else, it's a disconnection notification for the peer with that ID. - -Known bugs: - -* The protocol changed incompatibly in QEMU 2.5. Before, messages - were native endian long, and there was no version number. - -* The protocol is poorly designed. - -=== The ivshmem Client-Client Protocol === - -An ivshmem device configured for interrupts receives eventfd file -descriptors for interrupting peers and getting interrupted by peers -from the server, as explained in the previous section. - -To interrupt a peer, the device writes the 8-byte integer 1 in native -byte order to the respective file descriptor. - -To receive an interrupt, the device reads and discards as many 8-byte -integers as it can. diff --git a/docs/specs/pci-ids.rst b/docs/specs/pci-ids.rst index d6707fa069..c0a3dec2e7 100644 --- a/docs/specs/pci-ids.rst +++ b/docs/specs/pci-ids.rst @@ -50,7 +50,7 @@ maintained as part of the virtio specification. by QEMU. 1af4:1110 - ivshmem device (shared memory, ``docs/specs/ivshmem-spec.txt``) + ivshmem device (:doc:`ivshmem-spec`) All other device IDs are reserved. diff --git a/docs/system/devices/ivshmem.rst b/docs/system/devices/ivshmem.rst index e7aaf34c20..ce71e25663 100644 --- a/docs/system/devices/ivshmem.rst +++ b/docs/system/devices/ivshmem.rst @@ -33,7 +33,7 @@ syntax when using the shared memory server is: When using the server, the guest will be assigned a VM ID (>=0) that allows guests using the same server to communicate via interrupts. Guests can read their VM ID from a device register (see -ivshmem-spec.txt). +:doc:`../../specs/ivshmem-spec`). Migration with ivshmem ~~~~~~~~~~~~~~~~~~~~~~ -- cgit v1.2.3