diff options
author | Paolo Bonzini <pbonzini@redhat.com> | 2017-06-06 16:46:26 +0200 |
---|---|---|
committer | Paolo Bonzini <pbonzini@redhat.com> | 2017-06-07 18:22:03 +0200 |
commit | ac06724a715864942e2b5e28f92d5d5421f0a0b0 (patch) | |
tree | 8eeb9a6aeff09669b65573b1d856426cdf87d8bd /docs/atomics.txt | |
parent | 90bb0c04214545beb75044a2742f711335103269 (diff) |
docs: create config/, devel/ and spin/ subdirectories
Developer documentation should be its own manual. As a start, move all
developer-oriented files to a separate directory.
Also move non-text files to their own directories: docs/config/ for
QEMU -readconfig input, and docs/spin/ for formal models to be used
with the SPIN model checker.
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Diffstat (limited to 'docs/atomics.txt')
-rw-r--r-- | docs/atomics.txt | 388 |
1 files changed, 0 insertions, 388 deletions
diff --git a/docs/atomics.txt b/docs/atomics.txt deleted file mode 100644 index 3ef5d85b1b..0000000000 --- a/docs/atomics.txt +++ /dev/null @@ -1,388 +0,0 @@ -CPUs perform independent memory operations effectively in random order. -but this can be a problem for CPU-CPU interaction (including interactions -between QEMU and the guest). Multi-threaded programs use various tools -to instruct the compiler and the CPU to restrict the order to something -that is consistent with the expectations of the programmer. - -The most basic tool is locking. Mutexes, condition variables and -semaphores are used in QEMU, and should be the default approach to -synchronization. Anything else is considerably harder, but it's -also justified more often than one would like. The two tools that -are provided by qemu/atomic.h are memory barriers and atomic operations. - -Macros defined by qemu/atomic.h fall in three camps: - -- compiler barriers: barrier(); - -- weak atomic access and manual memory barriers: atomic_read(), - atomic_set(), smp_rmb(), smp_wmb(), smp_mb(), smp_mb_acquire(), - smp_mb_release(), smp_read_barrier_depends(); - -- sequentially consistent atomic access: everything else. - - -COMPILER MEMORY BARRIER -======================= - -barrier() prevents the compiler from moving the memory accesses either -side of it to the other side. The compiler barrier has no direct effect -on the CPU, which may then reorder things however it wishes. - -barrier() is mostly used within qemu/atomic.h itself. On some -architectures, CPU guarantees are strong enough that blocking compiler -optimizations already ensures the correct order of execution. In this -case, qemu/atomic.h will reduce stronger memory barriers to simple -compiler barriers. - -Still, barrier() can be useful when writing code that can be interrupted -by signal handlers. - - -SEQUENTIALLY CONSISTENT ATOMIC ACCESS -===================================== - -Most of the operations in the qemu/atomic.h header ensure *sequential -consistency*, where "the result of any execution is the same as if the -operations of all the processors were executed in some sequential order, -and the operations of each individual processor appear in this sequence -in the order specified by its program". - -qemu/atomic.h provides the following set of atomic read-modify-write -operations: - - void atomic_inc(ptr) - void atomic_dec(ptr) - void atomic_add(ptr, val) - void atomic_sub(ptr, val) - void atomic_and(ptr, val) - void atomic_or(ptr, val) - - typeof(*ptr) atomic_fetch_inc(ptr) - typeof(*ptr) atomic_fetch_dec(ptr) - typeof(*ptr) atomic_fetch_add(ptr, val) - typeof(*ptr) atomic_fetch_sub(ptr, val) - typeof(*ptr) atomic_fetch_and(ptr, val) - typeof(*ptr) atomic_fetch_or(ptr, val) - typeof(*ptr) atomic_xchg(ptr, val) - typeof(*ptr) atomic_cmpxchg(ptr, old, new) - -all of which return the old value of *ptr. These operations are -polymorphic; they operate on any type that is as wide as an int. - -Sequentially consistent loads and stores can be done using: - - atomic_fetch_add(ptr, 0) for loads - atomic_xchg(ptr, val) for stores - -However, they are quite expensive on some platforms, notably POWER and -ARM. Therefore, qemu/atomic.h provides two primitives with slightly -weaker constraints: - - typeof(*ptr) atomic_mb_read(ptr) - void atomic_mb_set(ptr, val) - -The semantics of these primitives map to Java volatile variables, -and are strongly related to memory barriers as used in the Linux -kernel (see below). - -As long as you use atomic_mb_read and atomic_mb_set, accesses cannot -be reordered with each other, and it is also not possible to reorder -"normal" accesses around them. - -However, and this is the important difference between -atomic_mb_read/atomic_mb_set and sequential consistency, it is important -for both threads to access the same volatile variable. It is not the -case that everything visible to thread A when it writes volatile field f -becomes visible to thread B after it reads volatile field g. The store -and load have to "match" (i.e., be performed on the same volatile -field) to achieve the right semantics. - - -These operations operate on any type that is as wide as an int or smaller. - - -WEAK ATOMIC ACCESS AND MANUAL MEMORY BARRIERS -============================================= - -Compared to sequentially consistent atomic access, programming with -weaker consistency models can be considerably more complicated. -In general, if the algorithm you are writing includes both writes -and reads on the same side, it is generally simpler to use sequentially -consistent primitives. - -When using this model, variables are accessed with atomic_read() and -atomic_set(), and restrictions to the ordering of accesses is enforced -using the memory barrier macros: smp_rmb(), smp_wmb(), smp_mb(), -smp_mb_acquire(), smp_mb_release(), smp_read_barrier_depends(). - -atomic_read() and atomic_set() prevents the compiler from using -optimizations that might otherwise optimize accesses out of existence -on the one hand, or that might create unsolicited accesses on the other. -In general this should not have any effect, because the same compiler -barriers are already implied by memory barriers. However, it is useful -to do so, because it tells readers which variables are shared with -other threads, and which are local to the current thread or protected -by other, more mundane means. - -Memory barriers control the order of references to shared memory. -They come in six kinds: - -- smp_rmb() guarantees that all the LOAD operations specified before - the barrier will appear to happen before all the LOAD operations - specified after the barrier with respect to the other components of - the system. - - In other words, smp_rmb() puts a partial ordering on loads, but is not - required to have any effect on stores. - -- smp_wmb() guarantees that all the STORE operations specified before - the barrier will appear to happen before all the STORE operations - specified after the barrier with respect to the other components of - the system. - - In other words, smp_wmb() puts a partial ordering on stores, but is not - required to have any effect on loads. - -- smp_mb_acquire() guarantees that all the LOAD operations specified before - the barrier will appear to happen before all the LOAD or STORE operations - specified after the barrier with respect to the other components of - the system. - -- smp_mb_release() guarantees that all the STORE operations specified *after* - the barrier will appear to happen after all the LOAD or STORE operations - specified *before* the barrier with respect to the other components of - the system. - -- smp_mb() guarantees that all the LOAD and STORE operations specified - before the barrier will appear to happen before all the LOAD and - STORE operations specified after the barrier with respect to the other - components of the system. - - smp_mb() puts a partial ordering on both loads and stores. It is - stronger than both a read and a write memory barrier; it implies both - smp_mb_acquire() and smp_mb_release(), but it also prevents STOREs - coming before the barrier from overtaking LOADs coming after the - barrier and vice versa. - -- smp_read_barrier_depends() is a weaker kind of read barrier. On - most processors, whenever two loads are performed such that the - second depends on the result of the first (e.g., the first load - retrieves the address to which the second load will be directed), - the processor will guarantee that the first LOAD will appear to happen - before the second with respect to the other components of the system. - However, this is not always true---for example, it was not true on - Alpha processors. Whenever this kind of access happens to shared - memory (that is not protected by a lock), a read barrier is needed, - and smp_read_barrier_depends() can be used instead of smp_rmb(). - - Note that the first load really has to have a _data_ dependency and not - a control dependency. If the address for the second load is dependent - on the first load, but the dependency is through a conditional rather - than actually loading the address itself, then it's a _control_ - dependency and a full read barrier or better is required. - - -This is the set of barriers that is required *between* two atomic_read() -and atomic_set() operations to achieve sequential consistency: - - | 2nd operation | - |-----------------------------------------------| - 1st operation | (after last) | atomic_read | atomic_set | - ---------------+----------------+-------------+----------------| - (before first) | | none | smp_mb_release | - ---------------+----------------+-------------+----------------| - atomic_read | smp_mb_acquire | smp_rmb | ** | - ---------------+----------------+-------------+----------------| - atomic_set | none | smp_mb()*** | smp_wmb() | - ---------------+----------------+-------------+----------------| - - * Or smp_read_barrier_depends(). - - ** This requires a load-store barrier. This is achieved by - either smp_mb_acquire() or smp_mb_release(). - - *** This requires a store-load barrier. On most machines, the only - way to achieve this is a full barrier. - - -You can see that the two possible definitions of atomic_mb_read() -and atomic_mb_set() are the following: - - 1) atomic_mb_read(p) = atomic_read(p); smp_mb_acquire() - atomic_mb_set(p, v) = smp_mb_release(); atomic_set(p, v); smp_mb() - - 2) atomic_mb_read(p) = smp_mb() atomic_read(p); smp_mb_acquire() - atomic_mb_set(p, v) = smp_mb_release(); atomic_set(p, v); - -Usually the former is used, because smp_mb() is expensive and a program -normally has more reads than writes. Therefore it makes more sense to -make atomic_mb_set() the more expensive operation. - -There are two common cases in which atomic_mb_read and atomic_mb_set -generate too many memory barriers, and thus it can be useful to manually -place barriers instead: - -- when a data structure has one thread that is always a writer - and one thread that is always a reader, manual placement of - memory barriers makes the write side faster. Furthermore, - correctness is easy to check for in this case using the "pairing" - trick that is explained below: - - thread 1 thread 1 - ------------------------- ------------------------ - (other writes) - smp_mb_release() - atomic_mb_set(&a, x) atomic_set(&a, x) - smp_wmb() - atomic_mb_set(&b, y) atomic_set(&b, y) - - => - thread 2 thread 2 - ------------------------- ------------------------ - y = atomic_mb_read(&b) y = atomic_read(&b) - smp_rmb() - x = atomic_mb_read(&a) x = atomic_read(&a) - smp_mb_acquire() - - Note that the barrier between the stores in thread 1, and between - the loads in thread 2, has been optimized here to a write or a - read memory barrier respectively. On some architectures, notably - ARMv7, smp_mb_acquire and smp_mb_release are just as expensive as - smp_mb, but smp_rmb and/or smp_wmb are more efficient. - -- sometimes, a thread is accessing many variables that are otherwise - unrelated to each other (for example because, apart from the current - thread, exactly one other thread will read or write each of these - variables). In this case, it is possible to "hoist" the implicit - barriers provided by atomic_mb_read() and atomic_mb_set() outside - a loop. For example, the above definition atomic_mb_read() gives - the following transformation: - - n = 0; n = 0; - for (i = 0; i < 10; i++) => for (i = 0; i < 10; i++) - n += atomic_mb_read(&a[i]); n += atomic_read(&a[i]); - smp_mb_acquire(); - - Similarly, atomic_mb_set() can be transformed as follows: - smp_mb(): - - smp_mb_release(); - for (i = 0; i < 10; i++) => for (i = 0; i < 10; i++) - atomic_mb_set(&a[i], false); atomic_set(&a[i], false); - smp_mb(); - - -The two tricks can be combined. In this case, splitting a loop in -two lets you hoist the barriers out of the loops _and_ eliminate the -expensive smp_mb(): - - smp_mb_release(); - for (i = 0; i < 10; i++) { => for (i = 0; i < 10; i++) - atomic_mb_set(&a[i], false); atomic_set(&a[i], false); - atomic_mb_set(&b[i], false); smb_wmb(); - } for (i = 0; i < 10; i++) - atomic_set(&a[i], false); - smp_mb(); - - The other thread can still use atomic_mb_read()/atomic_mb_set() - - -Memory barrier pairing ----------------------- - -A useful rule of thumb is that memory barriers should always, or almost -always, be paired with another barrier. In the case of QEMU, however, -note that the other barrier may actually be in a driver that runs in -the guest! - -For the purposes of pairing, smp_read_barrier_depends() and smp_rmb() -both count as read barriers. A read barrier shall pair with a write -barrier or a full barrier; a write barrier shall pair with a read -barrier or a full barrier. A full barrier can pair with anything. -For example: - - thread 1 thread 2 - =============== =============== - a = 1; - smp_wmb(); - b = 2; x = b; - smp_rmb(); - y = a; - -Note that the "writing" thread is accessing the variables in the -opposite order as the "reading" thread. This is expected: stores -before the write barrier will normally match the loads after the -read barrier, and vice versa. The same is true for more than 2 -access and for data dependency barriers: - - thread 1 thread 2 - =============== =============== - b[2] = 1; - smp_wmb(); - x->i = 2; - smp_wmb(); - a = x; x = a; - smp_read_barrier_depends(); - y = x->i; - smp_read_barrier_depends(); - z = b[y]; - -smp_wmb() also pairs with atomic_mb_read() and smp_mb_acquire(). -and smp_rmb() also pairs with atomic_mb_set() and smp_mb_release(). - - -COMPARISON WITH LINUX KERNEL MEMORY BARRIERS -============================================ - -Here is a list of differences between Linux kernel atomic operations -and memory barriers, and the equivalents in QEMU: - -- atomic operations in Linux are always on a 32-bit int type and - use a boxed atomic_t type; atomic operations in QEMU are polymorphic - and use normal C types. - -- Originally, atomic_read and atomic_set in Linux gave no guarantee - at all. Linux 4.1 updated them to implement volatile - semantics via ACCESS_ONCE (or the more recent READ/WRITE_ONCE). - - QEMU's atomic_read/set implement, if the compiler supports it, C11 - atomic relaxed semantics, and volatile semantics otherwise. - Both semantics prevent the compiler from doing certain transformations; - the difference is that atomic accesses are guaranteed to be atomic, - while volatile accesses aren't. Thus, in the volatile case we just cross - our fingers hoping that the compiler will generate atomic accesses, - since we assume the variables passed are machine-word sized and - properly aligned. - No barriers are implied by atomic_read/set in either Linux or QEMU. - -- atomic read-modify-write operations in Linux are of three kinds: - - atomic_OP returns void - atomic_OP_return returns new value of the variable - atomic_fetch_OP returns the old value of the variable - atomic_cmpxchg returns the old value of the variable - - In QEMU, the second kind does not exist. Currently Linux has - atomic_fetch_or only. QEMU provides and, or, inc, dec, add, sub. - -- different atomic read-modify-write operations in Linux imply - a different set of memory barriers; in QEMU, all of them enforce - sequential consistency, which means they imply full memory barriers - before and after the operation. - -- Linux does not have an equivalent of atomic_mb_set(). In particular, - note that smp_store_mb() is a little weaker than atomic_mb_set(). - atomic_mb_read() compiles to the same instructions as Linux's - smp_load_acquire(), but this should be treated as an implementation - detail. QEMU does have atomic_load_acquire() and atomic_store_release() - macros, but for now they are only used within atomic.h. This may - change in the future. - - -SOURCES -======= - -* Documentation/memory-barriers.txt from the Linux kernel - -* "The JSR-133 Cookbook for Compiler Writers", available at - http://g.oswego.edu/dl/jmm/cookbook.html |