aboutsummaryrefslogtreecommitdiff
path: root/docs/rcu.txt
diff options
context:
space:
mode:
authorPaolo Bonzini <pbonzini@redhat.com>2017-06-06 16:46:26 +0200
committerPaolo Bonzini <pbonzini@redhat.com>2017-06-07 18:22:03 +0200
commitac06724a715864942e2b5e28f92d5d5421f0a0b0 (patch)
tree8eeb9a6aeff09669b65573b1d856426cdf87d8bd /docs/rcu.txt
parent90bb0c04214545beb75044a2742f711335103269 (diff)
docs: create config/, devel/ and spin/ subdirectories
Developer documentation should be its own manual. As a start, move all developer-oriented files to a separate directory. Also move non-text files to their own directories: docs/config/ for QEMU -readconfig input, and docs/spin/ for formal models to be used with the SPIN model checker. Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Diffstat (limited to 'docs/rcu.txt')
-rw-r--r--docs/rcu.txt390
1 files changed, 0 insertions, 390 deletions
diff --git a/docs/rcu.txt b/docs/rcu.txt
deleted file mode 100644
index c84e7f42b2..0000000000
--- a/docs/rcu.txt
+++ /dev/null
@@ -1,390 +0,0 @@
-Using RCU (Read-Copy-Update) for synchronization
-================================================
-
-Read-copy update (RCU) is a synchronization mechanism that is used to
-protect read-mostly data structures. RCU is very efficient and scalable
-on the read side (it is wait-free), and thus can make the read paths
-extremely fast.
-
-RCU supports concurrency between a single writer and multiple readers,
-thus it is not used alone. Typically, the write-side will use a lock to
-serialize multiple updates, but other approaches are possible (e.g.,
-restricting updates to a single task). In QEMU, when a lock is used,
-this will often be the "iothread mutex", also known as the "big QEMU
-lock" (BQL). Also, restricting updates to a single task is done in
-QEMU using the "bottom half" API.
-
-RCU is fundamentally a "wait-to-finish" mechanism. The read side marks
-sections of code with "critical sections", and the update side will wait
-for the execution of all *currently running* critical sections before
-proceeding, or before asynchronously executing a callback.
-
-The key point here is that only the currently running critical sections
-are waited for; critical sections that are started _after_ the beginning
-of the wait do not extend the wait, despite running concurrently with
-the updater. This is the reason why RCU is more scalable than,
-for example, reader-writer locks. It is so much more scalable that
-the system will have a single instance of the RCU mechanism; a single
-mechanism can be used for an arbitrary number of "things", without
-having to worry about things such as contention or deadlocks.
-
-How is this possible? The basic idea is to split updates in two phases,
-"removal" and "reclamation". During removal, we ensure that subsequent
-readers will not be able to get a reference to the old data. After
-removal has completed, a critical section will not be able to access
-the old data. Therefore, critical sections that begin after removal
-do not matter; as soon as all previous critical sections have finished,
-there cannot be any readers who hold references to the data structure,
-and these can now be safely reclaimed (e.g., freed or unref'ed).
-
-Here is a picture:
-
- thread 1 thread 2 thread 3
- ------------------- ------------------------ -------------------
- enter RCU crit.sec.
- | finish removal phase
- | begin wait
- | | enter RCU crit.sec.
- exit RCU crit.sec | |
- complete wait |
- begin reclamation phase |
- exit RCU crit.sec.
-
-
-Note how thread 3 is still executing its critical section when thread 2
-starts reclaiming data. This is possible, because the old version of the
-data structure was not accessible at the time thread 3 began executing
-that critical section.
-
-
-RCU API
-=======
-
-The core RCU API is small:
-
- void rcu_read_lock(void);
-
- Used by a reader to inform the reclaimer that the reader is
- entering an RCU read-side critical section.
-
- void rcu_read_unlock(void);
-
- Used by a reader to inform the reclaimer that the reader is
- exiting an RCU read-side critical section. Note that RCU
- read-side critical sections may be nested and/or overlapping.
-
- void synchronize_rcu(void);
-
- Blocks until all pre-existing RCU read-side critical sections
- on all threads have completed. This marks the end of the removal
- phase and the beginning of reclamation phase.
-
- Note that it would be valid for another update to come while
- synchronize_rcu is running. Because of this, it is better that
- the updater releases any locks it may hold before calling
- synchronize_rcu. If this is not possible (for example, because
- the updater is protected by the BQL), you can use call_rcu.
-
- void call_rcu1(struct rcu_head * head,
- void (*func)(struct rcu_head *head));
-
- This function invokes func(head) after all pre-existing RCU
- read-side critical sections on all threads have completed. This
- marks the end of the removal phase, with func taking care
- asynchronously of the reclamation phase.
-
- The foo struct needs to have an rcu_head structure added,
- perhaps as follows:
-
- struct foo {
- struct rcu_head rcu;
- int a;
- char b;
- long c;
- };
-
- so that the reclaimer function can fetch the struct foo address
- and free it:
-
- call_rcu1(&foo.rcu, foo_reclaim);
-
- void foo_reclaim(struct rcu_head *rp)
- {
- struct foo *fp = container_of(rp, struct foo, rcu);
- g_free(fp);
- }
-
- For the common case where the rcu_head member is the first of the
- struct, you can use the following macro.
-
- void call_rcu(T *p,
- void (*func)(T *p),
- field-name);
- void g_free_rcu(T *p,
- field-name);
-
- call_rcu1 is typically used through these macro, in the common case
- where the "struct rcu_head" is the first field in the struct. If
- the callback function is g_free, in particular, g_free_rcu can be
- used. In the above case, one could have written simply:
-
- g_free_rcu(&foo, rcu);
-
- typeof(*p) atomic_rcu_read(p);
-
- atomic_rcu_read() is similar to atomic_mb_read(), but it makes
- some assumptions on the code that calls it. This allows a more
- optimized implementation.
-
- atomic_rcu_read assumes that whenever a single RCU critical
- section reads multiple shared data, these reads are either
- data-dependent or need no ordering. This is almost always the
- case when using RCU, because read-side critical sections typically
- navigate one or more pointers (the pointers that are changed on
- every update) until reaching a data structure of interest,
- and then read from there.
-
- RCU read-side critical sections must use atomic_rcu_read() to
- read data, unless concurrent writes are prevented by another
- synchronization mechanism.
-
- Furthermore, RCU read-side critical sections should traverse the
- data structure in a single direction, opposite to the direction
- in which the updater initializes it.
-
- void atomic_rcu_set(p, typeof(*p) v);
-
- atomic_rcu_set() is also similar to atomic_mb_set(), and it also
- makes assumptions on the code that calls it in order to allow a more
- optimized implementation.
-
- In particular, atomic_rcu_set() suffices for synchronization
- with readers, if the updater never mutates a field within a
- data item that is already accessible to readers. This is the
- case when initializing a new copy of the RCU-protected data
- structure; just ensure that initialization of *p is carried out
- before atomic_rcu_set() makes the data item visible to readers.
- If this rule is observed, writes will happen in the opposite
- order as reads in the RCU read-side critical sections (or if
- there is just one update), and there will be no need for other
- synchronization mechanism to coordinate the accesses.
-
-The following APIs must be used before RCU is used in a thread:
-
- void rcu_register_thread(void);
-
- Mark a thread as taking part in the RCU mechanism. Such a thread
- will have to report quiescent points regularly, either manually
- or through the QemuCond/QemuSemaphore/QemuEvent APIs.
-
- void rcu_unregister_thread(void);
-
- Mark a thread as not taking part anymore in the RCU mechanism.
- It is not a problem if such a thread reports quiescent points,
- either manually or by using the QemuCond/QemuSemaphore/QemuEvent
- APIs.
-
-Note that these APIs are relatively heavyweight, and should _not_ be
-nested.
-
-
-DIFFERENCES WITH LINUX
-======================
-
-- Waiting on a mutex is possible, though discouraged, within an RCU critical
- section. This is because spinlocks are rarely (if ever) used in userspace
- programming; not allowing this would prevent upgrading an RCU read-side
- critical section to become an updater.
-
-- atomic_rcu_read and atomic_rcu_set replace rcu_dereference and
- rcu_assign_pointer. They take a _pointer_ to the variable being accessed.
-
-- call_rcu is a macro that has an extra argument (the name of the first
- field in the struct, which must be a struct rcu_head), and expects the
- type of the callback's argument to be the type of the first argument.
- call_rcu1 is the same as Linux's call_rcu.
-
-
-RCU PATTERNS
-============
-
-Many patterns using read-writer locks translate directly to RCU, with
-the advantages of higher scalability and deadlock immunity.
-
-In general, RCU can be used whenever it is possible to create a new
-"version" of a data structure every time the updater runs. This may
-sound like a very strict restriction, however:
-
-- the updater does not mean "everything that writes to a data structure",
- but rather "everything that involves a reclamation step". See the
- array example below
-
-- in some cases, creating a new version of a data structure may actually
- be very cheap. For example, modifying the "next" pointer of a singly
- linked list is effectively creating a new version of the list.
-
-Here are some frequently-used RCU idioms that are worth noting.
-
-
-RCU list processing
--------------------
-
-TBD (not yet used in QEMU)
-
-
-RCU reference counting
-----------------------
-
-Because grace periods are not allowed to complete while there is an RCU
-read-side critical section in progress, the RCU read-side primitives
-may be used as a restricted reference-counting mechanism. For example,
-consider the following code fragment:
-
- rcu_read_lock();
- p = atomic_rcu_read(&foo);
- /* do something with p. */
- rcu_read_unlock();
-
-The RCU read-side critical section ensures that the value of "p" remains
-valid until after the rcu_read_unlock(). In some sense, it is acquiring
-a reference to p that is later released when the critical section ends.
-The write side looks simply like this (with appropriate locking):
-
- qemu_mutex_lock(&foo_mutex);
- old = foo;
- atomic_rcu_set(&foo, new);
- qemu_mutex_unlock(&foo_mutex);
- synchronize_rcu();
- free(old);
-
-If the processing cannot be done purely within the critical section, it
-is possible to combine this idiom with a "real" reference count:
-
- rcu_read_lock();
- p = atomic_rcu_read(&foo);
- foo_ref(p);
- rcu_read_unlock();
- /* do something with p. */
- foo_unref(p);
-
-The write side can be like this:
-
- qemu_mutex_lock(&foo_mutex);
- old = foo;
- atomic_rcu_set(&foo, new);
- qemu_mutex_unlock(&foo_mutex);
- synchronize_rcu();
- foo_unref(old);
-
-or with call_rcu:
-
- qemu_mutex_lock(&foo_mutex);
- old = foo;
- atomic_rcu_set(&foo, new);
- qemu_mutex_unlock(&foo_mutex);
- call_rcu(foo_unref, old, rcu);
-
-In both cases, the write side only performs removal. Reclamation
-happens when the last reference to a "foo" object is dropped.
-Using synchronize_rcu() is undesirably expensive, because the
-last reference may be dropped on the read side. Hence you can
-use call_rcu() instead:
-
- foo_unref(struct foo *p) {
- if (atomic_fetch_dec(&p->refcount) == 1) {
- call_rcu(foo_destroy, p, rcu);
- }
- }
-
-
-Note that the same idioms would be possible with reader/writer
-locks:
-
- read_lock(&foo_rwlock); write_mutex_lock(&foo_rwlock);
- p = foo; p = foo;
- /* do something with p. */ foo = new;
- read_unlock(&foo_rwlock); free(p);
- write_mutex_unlock(&foo_rwlock);
- free(p);
-
- ------------------------------------------------------------------
-
- read_lock(&foo_rwlock); write_mutex_lock(&foo_rwlock);
- p = foo; old = foo;
- foo_ref(p); foo = new;
- read_unlock(&foo_rwlock); foo_unref(old);
- /* do something with p. */ write_mutex_unlock(&foo_rwlock);
- read_lock(&foo_rwlock);
- foo_unref(p);
- read_unlock(&foo_rwlock);
-
-foo_unref could use a mechanism such as bottom halves to move deallocation
-out of the write-side critical section.
-
-
-RCU resizable arrays
---------------------
-
-Resizable arrays can be used with RCU. The expensive RCU synchronization
-(or call_rcu) only needs to take place when the array is resized.
-The two items to take care of are:
-
-- ensuring that the old version of the array is available between removal
- and reclamation;
-
-- avoiding mismatches in the read side between the array data and the
- array size.
-
-The first problem is avoided simply by not using realloc. Instead,
-each resize will allocate a new array and copy the old data into it.
-The second problem would arise if the size and the data pointers were
-two members of a larger struct:
-
- struct mystuff {
- ...
- int data_size;
- int data_alloc;
- T *data;
- ...
- };
-
-Instead, we store the size of the array with the array itself:
-
- struct arr {
- int size;
- int alloc;
- T data[];
- };
- struct arr *global_array;
-
- read side:
- rcu_read_lock();
- struct arr *array = atomic_rcu_read(&global_array);
- x = i < array->size ? array->data[i] : -1;
- rcu_read_unlock();
- return x;
-
- write side (running under a lock):
- if (global_array->size == global_array->alloc) {
- /* Creating a new version. */
- new_array = g_malloc(sizeof(struct arr) +
- global_array->alloc * 2 * sizeof(T));
- new_array->size = global_array->size;
- new_array->alloc = global_array->alloc * 2;
- memcpy(new_array->data, global_array->data,
- global_array->alloc * sizeof(T));
-
- /* Removal phase. */
- old_array = global_array;
- atomic_rcu_set(&new_array->data, new_array);
- synchronize_rcu();
-
- /* Reclamation phase. */
- free(old_array);
- }
-
-
-SOURCES
-=======
-
-* Documentation/RCU/ from the Linux kernel