aboutsummaryrefslogtreecommitdiff
path: root/cpus.c
diff options
context:
space:
mode:
authorSebastian Tanase <sebastian.tanase@openwide.fr>2014-07-25 11:56:31 +0200
committerPaolo Bonzini <pbonzini@redhat.com>2014-08-06 17:53:07 +0200
commitc2aa5f819900660f936faadfe92fe5d60a562482 (patch)
tree09c74582ec42dfebee60217a01665e784eb20747 /cpus.c
parenta8bfac37085c3372366d722f131a7e18d664ee4d (diff)
cpu-exec: Add sleeping algorithm
The goal is to sleep qemu whenever the guest clock is in advance compared to the host clock (we use the monotonic clocks). The amount of time to sleep is calculated in the execution loop in cpu_exec. At first, we tried to approximate at each for loop the real time elapsed while searching for a TB (generating or retrieving from cache) and executing it. We would then approximate the virtual time corresponding to the number of virtual instructions executed. The difference between these 2 values would allow us to know if the guest is in advance or delayed. However, the function used for measuring the real time (qemu_clock_get_ns(QEMU_CLOCK_REALTIME)) proved to be very expensive. We had an added overhead of 13% of the total run time. Therefore, we modified the algorithm and only take into account the difference between the 2 clocks at the begining of the cpu_exec function. During the for loop we try to reduce the advance of the guest only by computing the virtual time elapsed and sleeping if necessary. The overhead is thus reduced to 3%. Even though this method still has a noticeable overhead, it no longer is a bottleneck in trying to achieve a better guest frequency for which the guest clock is faster than the host one. As for the the alignement of the 2 clocks, with the first algorithm the guest clock was oscillating between -1 and 1ms compared to the host clock. Using the second algorithm we notice that the guest is 5ms behind the host, which is still acceptable for our use case. The tests where conducted using fio and stress. The host machine in an i5 CPU at 3.10GHz running Debian Jessie (kernel 3.12). The guest machine is an arm versatile-pb built with buildroot. Currently, on our test machine, the lowest icount we can achieve that is suitable for aligning the 2 clocks is 6. However, we observe that the IO tests (using fio) are slower than the cpu tests (using stress). Signed-off-by: Sebastian Tanase <sebastian.tanase@openwide.fr> Tested-by: Camille Bégué <camille.begue@openwide.fr> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Diffstat (limited to 'cpus.c')
-rw-r--r--cpus.c17
1 files changed, 17 insertions, 0 deletions
diff --git a/cpus.c b/cpus.c
index 7e09538799..19245e99b9 100644
--- a/cpus.c
+++ b/cpus.c
@@ -219,6 +219,23 @@ int64_t cpu_get_clock(void)
return ti;
}
+/* return the offset between the host clock and virtual CPU clock */
+int64_t cpu_get_clock_offset(void)
+{
+ int64_t ti;
+ unsigned start;
+
+ do {
+ start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
+ ti = timers_state.cpu_clock_offset;
+ if (!timers_state.cpu_ticks_enabled) {
+ ti -= get_clock();
+ }
+ } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
+
+ return -ti;
+}
+
/* enable cpu_get_ticks()
* Caller must hold BQL which server as mutex for vm_clock_seqlock.
*/