diff options
author | Paolo Bonzini <pbonzini@redhat.com> | 2015-03-10 16:45:57 +0100 |
---|---|---|
committer | Stefan Hajnoczi <stefanha@redhat.com> | 2015-03-12 17:41:23 +0000 |
commit | 2120465fbb87a69bd60283ec4034a0963766a7ef (patch) | |
tree | 83f65b5c66e8416b2b584982a6478d9bbe608967 /include | |
parent | 2a5b58e2405e9fe42ba356b5a1b78146a4e9a659 (diff) |
queue: fix QSLIST_INSERT_HEAD_ATOMIC race
There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC.
Because atomic_cmpxchg returns the old value instead of a success flag,
QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against
the second argument to atomic_cmpxchg. Unfortunately, this only works
if the second argument is a local or thread-local variable.
If it is in memory, it can be subject to common subexpression elimination
(and then everything's fine) or reloaded after the atomic_cmpxchg,
depending on the compiler's whims. If the latter happens, the race can
happen. A thread can sneak in, doing something on elm->field.sle_next
after the atomic_cmpxchg and before the comparison. This causes a wrong
failure, and then two threads are using "elm" at the same time. In the
case discovered by Christian, the sequence was likely something like this:
thread 1 | thread 2
QSLIST_INSERT_HEAD_ATOMIC |
atomic_cmpxchg succeeds |
elm added to list |
| steal release_pool
| QSLIST_REMOVE_HEAD
| elm removed from list
| ...
| QSLIST_INSERT_HEAD_ATOMIC
| (overwrites sle_next)
spurious failure |
atomic_cmpxchg succeeds |
elm added to list again |
|
steal release_pool |
QSLIST_REMOVE_HEAD |
elm removed again |
The last three steps could be done by a third thread as well.
A reproducer that failed in a matter of seconds is as follows:
- the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled),
memory was 16G just to err on the safe side (the host has 64G, but hey
at least you need no s390)
- the guest has 24 null-aio virtio-blk devices using dataplane
(-object iothread,id=ioN -drive if=none,id=blkN,driver=null-aio,size=500G
-device virtio-blk-pci,iothread=ioN,drive=blkN)
- the guest also has a single network interface. It's only doing loopback
tests so slirp vs. tap and the model doesn't matter.
- the guest is running fio with the following script:
[global]
rw=randread
blocksize=16k
ioengine=libaio
runtime=10m
buffered=0
fallocate=none
time_based
iodepth=32
[virtio1a]
filename=/dev/block/252\:16
[virtio1b]
filename=/dev/block/252\:16
...
[virtio24a]
filename=/dev/block/252\:384
[virtio24b]
filename=/dev/block/252\:384
[listen1]
protocol=tcp
ioengine=net
port=12345
listen
rw=read
bs=4k
size=1000g
[connect1]
protocol=tcp
hostname=localhost
ioengine=net
port=12345
protocol=tcp
rw=write
startdelay=1
size=1000g
...
[listen8]
protocol=tcp
ioengine=net
port=12352
listen
rw=read
bs=4k
size=1000g
[connect8]
protocol=tcp
hostname=localhost
ioengine=net
port=12352
rw=write
startdelay=1
size=1000g
Moral of the story: I should refrain from writing more clever stuff.
At least it looks like it is not too clever to be undebuggable.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1426002357-6889-1-git-send-email-pbonzini@redhat.com
Fixes: c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Diffstat (limited to 'include')
-rw-r--r-- | include/qemu/queue.h | 11 |
1 files changed, 6 insertions, 5 deletions
diff --git a/include/qemu/queue.h b/include/qemu/queue.h index 80941506ce..f781aa20a8 100644 --- a/include/qemu/queue.h +++ b/include/qemu/queue.h @@ -197,11 +197,12 @@ struct { \ (head)->slh_first = (elm); \ } while (/*CONSTCOND*/0) -#define QSLIST_INSERT_HEAD_ATOMIC(head, elm, field) do { \ - do { \ - (elm)->field.sle_next = (head)->slh_first; \ - } while (atomic_cmpxchg(&(head)->slh_first, (elm)->field.sle_next, \ - (elm)) != (elm)->field.sle_next); \ +#define QSLIST_INSERT_HEAD_ATOMIC(head, elm, field) do { \ + typeof(elm) save_sle_next; \ + do { \ + save_sle_next = (elm)->field.sle_next = (head)->slh_first; \ + } while (atomic_cmpxchg(&(head)->slh_first, save_sle_next, (elm)) != \ + save_sle_next); \ } while (/*CONSTCOND*/0) #define QSLIST_MOVE_ATOMIC(dest, src) do { \ |