slackcoder/qemu - QEMU is a generic and open source machine & userspace emulator and virtualizer

diff options

author	Paolo Bonzini <pbonzini@redhat.com>	2014-12-02 12:05:48 +0100
committer	Stefan Hajnoczi <stefanha@redhat.com>	2015-01-13 13:43:29 +0000
commit	4d68e86bb10159099da0798f74e7512955f15eec (patch)
tree	ee9ee441dbb03e93b74c238b6deecd86b29725c1 /target-s390x/mem_helper.c
parent	c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54 (diff)

coroutine: rewrite pool to avoid mutex

This patch removes the mutex by using fancy lock-free manipulation of the pool. Lock-free stacks and queues are not hard, but they can suffer from the ABA problem so they are better avoided unless you have some deferred reclamation scheme like RCU. Otherwise you have to stick with adding to a list, and emptying it completely. This is what this patch does, by coupling a lock-free global list of available coroutines with per-CPU lists that are actually used on coroutine creation. Whenever the destruction pool is big enough, the next thread that runs out of coroutines will steal the whole destruction pool. This is positive in two ways: 1) the allocation does not have to do any atomic operation in the fast path, it's entirely using thread-local storage. Once every POOL_BATCH_SIZE allocations it will do a single atomic_xchg. Release does an atomic_cmpxchg loop, that hopefully doesn't cause any starvation, and an atomic_inc. A later patch will also remove atomic operations from the release path, and try to avoid the atomic_xchg altogether---succeeding in doing so if all devices either use ioeventfd or are not submitting requests actively. 2) in theory this should be completely adaptive. The number of coroutines around should be a little more than POOL_BATCH_SIZE * number of allocating threads; so this also empties qemu_coroutine_adjust_pool_size. (The previous pool size was POOL_BATCH_SIZE * number of block backends, so it was a bit more generous. But if you actually have many high-iodepth disks, it's better to put them in different iothreads, which will also use separate thread pools and aio=native file descriptors). This speeds up perf/cost (in tests/test-coroutine) by a factor of ~1.33. No matter if we end with some kind of coroutine bypass scheme or not, it cannot hurt to optimize hot code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1417518350-6167-6-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Diffstat (limited to 'target-s390x/mem_helper.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: