aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/devel/atomics.txt57
-rw-r--r--docs/devel/qapi-code-gen.txt87
-rw-r--r--docs/interop/qmp-spec.txt36
-rw-r--r--docs/replay.txt163
4 files changed, 274 insertions, 69 deletions
diff --git a/docs/devel/atomics.txt b/docs/devel/atomics.txt
index 10c5fa37e8..a4db3a4aaa 100644
--- a/docs/devel/atomics.txt
+++ b/docs/devel/atomics.txt
@@ -122,20 +122,30 @@ In general, if the algorithm you are writing includes both writes
and reads on the same side, it is generally simpler to use sequentially
consistent primitives.
-When using this model, variables are accessed with atomic_read() and
-atomic_set(), and restrictions to the ordering of accesses is enforced
+When using this model, variables are accessed with:
+
+- atomic_read() and atomic_set(); these prevent the compiler from
+ optimizing accesses out of existence and creating unsolicited
+ accesses, but do not otherwise impose any ordering on loads and
+ stores: both the compiler and the processor are free to reorder
+ them.
+
+- atomic_load_acquire(), which guarantees the LOAD to appear to
+ happen, with respect to the other components of the system,
+ before all the LOAD or STORE operations specified afterwards.
+ Operations coming before atomic_load_acquire() can still be
+ reordered after it.
+
+- atomic_store_release(), which guarantees the STORE to appear to
+ happen, with respect to the other components of the system,
+ after all the LOAD or STORE operations specified afterwards.
+ Operations coming after atomic_store_release() can still be
+ reordered after it.
+
+Restrictions to the ordering of accesses can also be specified
using the memory barrier macros: smp_rmb(), smp_wmb(), smp_mb(),
smp_mb_acquire(), smp_mb_release(), smp_read_barrier_depends().
-atomic_read() and atomic_set() prevents the compiler from using
-optimizations that might otherwise optimize accesses out of existence
-on the one hand, or that might create unsolicited accesses on the other.
-In general this should not have any effect, because the same compiler
-barriers are already implied by memory barriers. However, it is useful
-to do so, because it tells readers which variables are shared with
-other threads, and which are local to the current thread or protected
-by other, more mundane means.
-
Memory barriers control the order of references to shared memory.
They come in six kinds:
@@ -232,7 +242,7 @@ make atomic_mb_set() the more expensive operation.
There are two common cases in which atomic_mb_read and atomic_mb_set
generate too many memory barriers, and thus it can be useful to manually
-place barriers instead:
+place barriers, or use atomic_load_acquire/atomic_store_release instead:
- when a data structure has one thread that is always a writer
and one thread that is always a reader, manual placement of
@@ -243,18 +253,15 @@ place barriers instead:
thread 1 thread 1
------------------------- ------------------------
(other writes)
- smp_mb_release()
- atomic_mb_set(&a, x) atomic_set(&a, x)
- smp_wmb()
- atomic_mb_set(&b, y) atomic_set(&b, y)
+ atomic_mb_set(&a, x) atomic_store_release(&a, x)
+ atomic_mb_set(&b, y) atomic_store_release(&b, y)
=>
thread 2 thread 2
------------------------- ------------------------
- y = atomic_mb_read(&b) y = atomic_read(&b)
- smp_rmb()
- x = atomic_mb_read(&a) x = atomic_read(&a)
- smp_mb_acquire()
+ y = atomic_mb_read(&b) y = atomic_load_acquire(&b)
+ x = atomic_mb_read(&a) x = atomic_load_acquire(&a)
+ (other reads)
Note that the barrier between the stores in thread 1, and between
the loads in thread 2, has been optimized here to a write or a
@@ -276,7 +283,6 @@ place barriers instead:
smp_mb_acquire();
Similarly, atomic_mb_set() can be transformed as follows:
- smp_mb():
smp_mb_release();
for (i = 0; i < 10; i++) => for (i = 0; i < 10; i++)
@@ -284,6 +290,8 @@ place barriers instead:
smp_mb();
+ The other thread can still use atomic_mb_read()/atomic_mb_set().
+
The two tricks can be combined. In this case, splitting a loop in
two lets you hoist the barriers out of the loops _and_ eliminate the
expensive smp_mb():
@@ -296,8 +304,6 @@ expensive smp_mb():
atomic_set(&a[i], false);
smp_mb();
- The other thread can still use atomic_mb_read()/atomic_mb_set()
-
Memory barrier pairing
----------------------
@@ -386,10 +392,7 @@ and memory barriers, and the equivalents in QEMU:
note that smp_store_mb() is a little weaker than atomic_mb_set().
atomic_mb_read() compiles to the same instructions as Linux's
smp_load_acquire(), but this should be treated as an implementation
- detail. QEMU does have atomic_load_acquire() and atomic_store_release()
- macros, but for now they are only used within atomic.h. This may
- change in the future.
-
+ detail.
SOURCES
=======
diff --git a/docs/devel/qapi-code-gen.txt b/docs/devel/qapi-code-gen.txt
index 25b7180a18..a569d24745 100644
--- a/docs/devel/qapi-code-gen.txt
+++ b/docs/devel/qapi-code-gen.txt
@@ -554,9 +554,12 @@ following example objects:
=== Commands ===
+--- General Command Layout ---
+
Usage: { 'command': STRING, '*data': COMPLEX-TYPE-NAME-OR-DICT,
'*returns': TYPE-NAME, '*boxed': true,
- '*gen': false, '*success-response': false }
+ '*gen': false, '*success-response': false,
+ '*allow-oob': true }
Commands are defined by using a dictionary containing several members,
where three members are most common. The 'command' member is a
@@ -636,6 +639,49 @@ possible, the command expression should include the optional key
'success-response' with boolean value false. So far, only QGA makes
use of this member.
+A command can be declared to support Out-Of-Band (OOB) execution. By
+default, commands do not support OOB. To declare a command that
+supports it, the schema includes an extra 'allow-oob' field. For
+example:
+
+ { 'command': 'migrate_recover',
+ 'data': { 'uri': 'str' }, 'allow-oob': true }
+
+To execute a command with out-of-band priority, the client specifies
+the "control" field in the request, with "run-oob" set to
+true. Example:
+
+ => { "execute": "command-support-oob",
+ "arguments": { ... },
+ "control": { "run-oob": true } }
+ <= { "return": { } }
+
+Without it, even the commands that support out-of-band execution will
+still be run in-band.
+
+Under normal QMP command execution, the following apply to each
+command:
+
+- They are executed in order,
+- They run only in main thread of QEMU,
+- They have the BQL taken during execution.
+
+When a command is executed with OOB, the following changes occur:
+
+- They can be completed before a pending in-band command,
+- They run in a dedicated monitor thread,
+- They do not take the BQL during execution.
+
+OOB command handlers must satisfy the following conditions:
+
+- It executes extremely fast,
+- It does not take any lock, or, it can take very small locks if all
+ critical regions also follow the rules for OOB command handler code,
+- It does not invoke system calls that may block,
+- It does not access guest RAM that may block when userfaultfd is
+ enabled for postcopy live migration.
+
+If in doubt, do not implement OOB execution support.
=== Events ===
@@ -739,10 +785,12 @@ references by name.
QAPI schema definitions not reachable that way are omitted.
The SchemaInfo for a command has meta-type "command", and variant
-members "arg-type" and "ret-type". On the wire, the "arguments"
-member of a client's "execute" command must conform to the object type
-named by "arg-type". The "return" member that the server passes in a
-success response conforms to the type named by "ret-type".
+members "arg-type", "ret-type" and "allow-oob". On the wire, the
+"arguments" member of a client's "execute" command must conform to the
+object type named by "arg-type". The "return" member that the server
+passes in a success response conforms to the type named by
+"ret-type". When "allow-oob" is set, it means the command supports
+out-of-band execution.
If the command takes no arguments, "arg-type" names an object type
without members. Likewise, if the command returns nothing, "ret-type"
@@ -1319,18 +1367,27 @@ Example:
#ifndef EXAMPLE_QMP_INTROSPECT_H
#define EXAMPLE_QMP_INTROSPECT_H
- extern const char example_qmp_schema_json[];
+ extern const QLitObject qmp_schema_qlit;
#endif
$ cat qapi-generated/example-qapi-introspect.c
[Uninteresting stuff omitted...]
- const char example_qmp_schema_json[] = "["
- "{\"arg-type\": \"0\", \"meta-type\": \"event\", \"name\": \"MY_EVENT\"}, "
- "{\"arg-type\": \"1\", \"meta-type\": \"command\", \"name\": \"my-command\", \"ret-type\": \"2\"}, "
- "{\"members\": [], \"meta-type\": \"object\", \"name\": \"0\"}, "
- "{\"members\": [{\"name\": \"arg1\", \"type\": \"[2]\"}], \"meta-type\": \"object\", \"name\": \"1\"}, "
- "{\"members\": [{\"name\": \"integer\", \"type\": \"int\"}, {\"default\": null, \"name\": \"string\", \"type\": \"str\"}], \"meta-type\": \"object\", \"name\": \"2\"}, "
- "{\"element-type\": \"2\", \"meta-type\": \"array\", \"name\": \"[2]\"}, "
- "{\"json-type\": \"int\", \"meta-type\": \"builtin\", \"name\": \"int\"}, "
- "{\"json-type\": \"string\", \"meta-type\": \"builtin\", \"name\": \"str\"}]";
+ const QLitObject example_qmp_schema_qlit = QLIT_QLIST(((QLitObject[]) {
+ QLIT_QDICT(((QLitDictEntry[]) {
+ { "arg-type", QLIT_QSTR("0") },
+ { "meta-type", QLIT_QSTR("event") },
+ { "name", QLIT_QSTR("Event") },
+ { }
+ })),
+ QLIT_QDICT(((QLitDictEntry[]) {
+ { "members", QLIT_QLIST(((QLitObject[]) {
+ { }
+ })) },
+ { "meta-type", QLIT_QSTR("object") },
+ { "name", QLIT_QSTR("0") },
+ { }
+ })),
+ ...
+ { }
+ }));
diff --git a/docs/interop/qmp-spec.txt b/docs/interop/qmp-spec.txt
index f8b5356015..6fa193a80b 100644
--- a/docs/interop/qmp-spec.txt
+++ b/docs/interop/qmp-spec.txt
@@ -83,16 +83,27 @@ The greeting message format is:
2.2.1 Capabilities
------------------
-As of the date this document was last revised, no server or client
-capability strings have been defined.
+Currently supported capabilities are:
+- "oob": the QMP server supports "Out-Of-Band" (OOB) command
+ execution. For more details, please see the "run-oob" parameter in
+ the "Issuing Commands" section below. Not all commands allow this
+ "oob" execution. The "query-qmp-schema" command can be used to
+ inspect which commands support "oob" execution.
+
+QMP clients can get a list of supported QMP capabilities of the QMP
+server in the greeting message mentioned above. By default, all the
+capabilities are off. To enable any QMP capabilities, the QMP client
+needs to send the "qmp_capabilities" command with an extra parameter
+for the requested capabilities.
2.3 Issuing Commands
--------------------
The format for command execution is:
-{ "execute": json-string, "arguments": json-object, "id": json-value }
+{ "execute": json-string, "arguments": json-object, "id": json-value,
+ "control": json-object }
Where,
@@ -102,10 +113,16 @@ The format for command execution is:
required. Each command documents what contents will be considered
valid when handling the json-argument
- The "id" member is a transaction identification associated with the
- command execution, it is optional and will be part of the response if
- provided. The "id" member can be any json-value, although most
- clients merely use a json-number incremented for each successive
- command
+ command execution. It is required for all commands if the OOB -
+ capability was enabled at startup, and optional otherwise. The same
+ "id" field will be part of the response if provided. The "id" member
+ can be any json-value, although most clients merely use a
+ json-number incremented for each successive command
+- The "control" member is optional, and currently only used for
+ out-of-band execution. The handling or response of an "oob" command
+ can overtake prior in-band commands. To enable "oob" handling of a
+ particular command, just provide a control field with: { "control":
+ { "run-oob": true } }
2.4 Commands Responses
----------------------
@@ -113,6 +130,11 @@ The format for command execution is:
There are two possible responses which the Server will issue as the result
of a command execution: success or error.
+As long as the commands were issued with a proper "id" field, then the
+same "id" field will be attached in the corresponding response message
+so that requests and responses can match. Clients should drop all the
+responses that have an unknown "id" field.
+
2.4.1 success
-------------
diff --git a/docs/replay.txt b/docs/replay.txt
index 486c1e0e9d..2e21e9ccb0 100644
--- a/docs/replay.txt
+++ b/docs/replay.txt
@@ -7,14 +7,10 @@ See the COPYING file in the top-level directory.
Record/replay
-------------
-Record/replay functions are used for the reverse execution and deterministic
-replay of qemu execution. This implementation of deterministic replay can
-be used for deterministic debugging of guest code through a gdb remote
-interface.
-
+Record/replay functions are used for the deterministic replay of qemu execution.
Execution recording writes a non-deterministic events log, which can be later
used for replaying the execution anywhere and for unlimited number of times.
-It also supports checkpointing for faster rewinding during reverse debugging.
+It also supports checkpointing for faster rewind to the specific replay moment.
Execution replaying reads the log and replays all non-deterministic events
including external input, hardware clocks, and interrupts.
@@ -28,16 +24,36 @@ Deterministic replay has the following features:
input devices.
Usage of the record/replay:
- * First, record the execution, by adding the following arguments to the command line:
- '-icount shift=7,rr=record,rrfile=replay.bin -net none'.
- Block devices' images are not actually changed in the recording mode,
+ * First, record the execution with the following command line:
+ qemu-system-i386 \
+ -icount shift=7,rr=record,rrfile=replay.bin \
+ -drive file=disk.qcow2,if=none,id=img-direct \
+ -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \
+ -device ide-hd,drive=img-blkreplay \
+ -netdev user,id=net1 -device rtl8139,netdev=net1 \
+ -object filter-replay,id=replay,netdev=net1
+ * After recording, you can replay it by using another command line:
+ qemu-system-i386 \
+ -icount shift=7,rr=replay,rrfile=replay.bin \
+ -drive file=disk.qcow2,if=none,id=img-direct \
+ -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \
+ -device ide-hd,drive=img-blkreplay \
+ -netdev user,id=net1 -device rtl8139,netdev=net1 \
+ -object filter-replay,id=replay,netdev=net1
+ The only difference with recording is changing the rr option
+ from record to replay.
+ * Block device images are not actually changed in the recording mode,
because all of the changes are written to the temporary overlay file.
- * Then you can replay it by using another command
- line option: '-icount shift=7,rr=replay,rrfile=replay.bin -net none'
- * '-net none' option should also be specified if network replay patches
- are not applied.
-
-Papers with description of deterministic replay implementation:
+ This behavior is enabled by using blkreplay driver. It should be used
+ for every enabled block device, as described in 'Block devices' section.
+ * '-net none' option should be specified when network is not used,
+ because QEMU adds network card by default. When network is needed,
+ it should be configured explicitly with replay filter, as described
+ in 'Network devices' section.
+ * Interaction with audio devices and serial ports are recorded and replayed
+ automatically when such devices are enabled.
+
+Academic papers with description of deterministic replay implementation:
http://www.computer.org/csdl/proceedings/csmr/2012/4666/00/4666a553-abs.html
http://dl.acm.org/citation.cfm?id=2786805.2803179
@@ -46,8 +62,33 @@ Modifications of qemu include:
* saving different asynchronous events (e.g. system shutdown) into the log
* synchronization of the bottom halves execution
* synchronization of the threads from thread pool
- * recording/replaying user input (mouse and keyboard)
+ * recording/replaying user input (mouse, keyboard, and microphone)
* adding internal checkpoints for cpu and io synchronization
+ * network filter for recording and replaying the packets
+ * block driver for making block layer deterministic
+ * serial port input record and replay
+
+Locking and thread synchronisation
+----------------------------------
+
+Previously the synchronisation of the main thread and the vCPU thread
+was ensured by the holding of the BQL. However the trend has been to
+reduce the time the BQL was held across the system including under TCG
+system emulation. As it is important that batches of events are kept
+in sequence (e.g. expiring timers and checkpoints in the main thread
+while instruction checkpoints are written by the vCPU thread) we need
+another lock to keep things in lock-step. This role is now handled by
+the replay_mutex_lock. It used to be held only for each event being
+written but now it is held for a whole execution period. This results
+in a deterministic ping-pong between the two main threads.
+
+As the BQL is now a finer grained lock than the replay_lock it is almost
+certainly a bug, and a source of deadlocks, to take the
+replay_mutex_lock while the BQL is held. This is enforced by an assert.
+While the unlocks are usually in the reverse order, this is not
+necessary; you can drop the replay_lock while holding the BQL, without
+doing a more complicated unlock_iothread/replay_unlock/lock_iothread
+sequence.
Non-deterministic events
------------------------
@@ -55,12 +96,11 @@ Non-deterministic events
Our record/replay system is based on saving and replaying non-deterministic
events (e.g. keyboard input) and simulating deterministic ones (e.g. reading
from HDD or memory of the VM). Saving only non-deterministic events makes
-log file smaller, simulation faster, and allows using reverse debugging even
-for realtime applications.
+log file smaller and simulation faster.
The following non-deterministic data from peripheral devices is saved into
the log: mouse and keyboard input, network packets, audio controller input,
-USB packets, serial port input, and hardware clocks (they are non-deterministic
+serial port input, and hardware clocks (they are non-deterministic
too, because their values are taken from the host machine). Inputs from
simulated hardware, memory of VM, software interrupts, and execution of
instructions are not saved into the log, because they are deterministic and
@@ -183,7 +223,7 @@ Block devices record/replay module intercepts calls of
bdrv coroutine functions at the top of block drivers stack.
To record and replay block operations the drive must be configured
as following:
- -drive file=disk.qcow,if=none,id=img-direct
+ -drive file=disk.qcow2,if=none,id=img-direct
-drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay
-device ide-hd,drive=img-blkreplay
@@ -212,6 +252,12 @@ This snapshot is created at start of recording and restored at start
of replaying. It also can be loaded while replaying to roll back
the execution.
+Use QEMU monitor to create additional snapshots. 'savevm <name>' command
+created the snapshot and 'loadvm <name>' restores it. To prevent corruption
+of the original disk image, use overlay files linked to the original images.
+Therefore all new snapshots (including the starting one) will be saved in
+overlays and the original image remains unchanged.
+
Network devices
---------------
@@ -232,3 +278,80 @@ Audio devices
Audio data is recorded and replay automatically. The command line for recording
and replaying must contain identical specifications of audio hardware, e.g.:
-soundhw ac97
+
+Serial ports
+------------
+
+Serial ports input is recorded and replay automatically. The command lines
+for recording and replaying must contain identical number of ports in record
+and replay modes, but their backends may differ.
+E.g., '-serial stdio' in record mode, and '-serial null' in replay mode.
+
+Replay log format
+-----------------
+
+Record/replay log consits of the header and the sequence of execution
+events. The header includes 4-byte replay version id and 8-byte reserved
+field. Version is updated every time replay log format changes to prevent
+using replay log created by another build of qemu.
+
+The sequence of the events describes virtual machine state changes.
+It includes all non-deterministic inputs of VM, synchronization marks and
+instruction counts used to correctly inject inputs at replay.
+
+Synchronization marks (checkpoints) are used for synchronizing qemu threads
+that perform operations with virtual hardware. These operations may change
+system's state (e.g., change some register or generate interrupt) and
+therefore should execute synchronously with CPU thread.
+
+Every event in the log includes 1-byte event id and optional arguments.
+When argument is an array, it is stored as 4-byte array length
+and corresponding number of bytes with data.
+Here is the list of events that are written into the log:
+
+ - EVENT_INSTRUCTION. Instructions executed since last event.
+ Argument: 4-byte number of executed instructions.
+ - EVENT_INTERRUPT. Used to synchronize interrupt processing.
+ - EVENT_EXCEPTION. Used to synchronize exception handling.
+ - EVENT_ASYNC. This is a group of events. They are always processed
+ together with checkpoints. When such an event is generated, it is
+ stored in the queue and processed only when checkpoint occurs.
+ Every such event is followed by 1-byte checkpoint id and 1-byte
+ async event id from the following list:
+ - REPLAY_ASYNC_EVENT_BH. Bottom-half callback. This event synchronizes
+ callbacks that affect virtual machine state, but normally called
+ asyncronously.
+ Argument: 8-byte operation id.
+ - REPLAY_ASYNC_EVENT_INPUT. Input device event. Contains
+ parameters of keyboard and mouse input operations
+ (key press/release, mouse pointer movement).
+ Arguments: 9-16 bytes depending of input event.
+ - REPLAY_ASYNC_EVENT_INPUT_SYNC. Internal input synchronization event.
+ - REPLAY_ASYNC_EVENT_CHAR_READ. Character (e.g., serial port) device input
+ initiated by the sender.
+ Arguments: 1-byte character device id.
+ Array with bytes were read.
+ - REPLAY_ASYNC_EVENT_BLOCK. Block device operation. Used to synchronize
+ operations with disk and flash drives with CPU.
+ Argument: 8-byte operation id.
+ - REPLAY_ASYNC_EVENT_NET. Incoming network packet.
+ Arguments: 1-byte network adapter id.
+ 4-byte packet flags.
+ Array with packet bytes.
+ - EVENT_SHUTDOWN. Occurs when user sends shutdown event to qemu,
+ e.g., by closing the window.
+ - EVENT_CHAR_WRITE. Used to synchronize character output operations.
+ Arguments: 4-byte output function return value.
+ 4-byte offset in the output array.
+ - EVENT_CHAR_READ_ALL. Used to synchronize character input operations,
+ initiated by qemu.
+ Argument: Array with bytes that were read.
+ - EVENT_CHAR_READ_ALL_ERROR. Unsuccessful character input operation,
+ initiated by qemu.
+ Argument: 4-byte error code.
+ - EVENT_CLOCK + clock_id. Group of events for host clock read operations.
+ Argument: 8-byte clock value.
+ - EVENT_CHECKPOINT + checkpoint_id. Checkpoint for synchronization of
+ CPU, internal threads, and asynchronous input events. May be followed
+ by one or more EVENT_ASYNC events.
+ - EVENT_END. Last event in the log.