aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-09-13mirror: auto complete active commitWen Congyang
Auto complete mirror job in background to prevent from blocking synchronously Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Message-id: 1469602913-20979-7-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13docs: block replication's descriptionWen Congyang
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1469602913-20979-6-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13block: Link backup into block coreWen Congyang
Some programs that add a dependency on it will use the block layer directly. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1469602913-20979-5-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13Backup: export interfaces for extra serializationChanglong Xie
Normal backup(sync='none') workflow: step 1. NBD peformance I/O write from client to server qcow2_co_writev bdrv_co_writev ... bdrv_aligned_pwritev notifier_with_return_list_notify -> backup_do_cow bdrv_driver_pwritev // write new contents step 2. drive-backup sync=none backup_do_cow { wait_for_overlapping_requests cow_request_begin for(; start < end; start++) { bdrv_co_readv_no_serialising //read old contents from Secondary disk bdrv_co_writev // write old contents to hidden-disk } cow_request_end } step 3. Then roll back to "step 1" to write new contents to Secondary disk. And for replication, we must make sure that we only read the old contents from Secondary disk in order to keep contents consistent. 1) Replication workflow of Secondary virtio-blk ^ -------> 1 NBD | || server 3 replication || ^ ^ || | backing backing | || Secondary disk 6<-------- hidden-disk 5 <-------- active-disk 4 || | ^ || '-------------------------' || drive-backup sync=none 2 Hence, we need these interfaces to implement coarse-grained serialization between COW of Secondary disk and the read operation of replication. Example codes about how to use them: *#include "block/block_backup.h" static coroutine_fn int xxx_co_readv() { CowRequest req; BlockJob *job = secondary_disk->bs->job; if (job) { backup_wait_for_overlapping_requests(job, start, end); backup_cow_request_begin(&req, job, start, end); ret = bdrv_co_readv(); backup_cow_request_end(&req); goto out; } ret = bdrv_co_readv(); out: return ret; } Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1469602913-20979-4-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13Backup: clear all bitmap when doing block checkpointWen Congyang
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1469602913-20979-3-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13block: unblock backup operations in backing fileWen Congyang
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Kashyap Chamarthy <kchamart@redhat.com> Message-id: 1469602913-20979-2-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13virtio-blk: rename virtio_device_info to virtio_blk_infoChanglong Xie
The old one is confusing with @virtio_device_info in virtio.c, so make it more appropriate. Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Message-id: 1470214147-32560-1-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13linux-aio: process completions from ioq_submit()Roman Pen
In order to reduce completion latency it makes sense to harvest completed requests ASAP. Very fast backend device can complete requests just after submission, so it is worth trying to check ring buffer in order to peek completed requests directly after io_submit() has been called. Indeed, this patch reduces the completions latencies and increases the overall throughput, e.g. the following is the percentiles of number of completed requests at once: 1th 10th 20th 30th 40th 50th 60th 70th 80th 90th 99.99th Before 2 4 42 112 128 128 128 128 128 128 128 After 1 1 4 14 33 45 47 48 50 51 108 That means, that before the current patch is applied the ring buffer is observed as full (128 requests were consumed at once) in 60% of calls. After patch is applied the distribution of number of completed requests is "smoother" and the queue (requests in-flight) is almost never full. The fio read results are the following (write results are almost the same and are not showed here): Before ------ job: (groupid=0, jobs=8): err= 0: pid=2227: Tue Jul 19 11:29:50 2016 Description : [Emulation of Storage Server Access Pattern] read : io=54681MB, bw=1822.7MB/s, iops=179779, runt= 30001msec slat (usec): min=172, max=16883, avg=338.35, stdev=109.66 clat (usec): min=1, max=21977, avg=1051.45, stdev=299.29 lat (usec): min=317, max=22521, avg=1389.83, stdev=300.73 clat percentiles (usec): | 1.00th=[ 346], 5.00th=[ 596], 10.00th=[ 708], 20.00th=[ 852], | 30.00th=[ 932], 40.00th=[ 996], 50.00th=[ 1048], 60.00th=[ 1112], | 70.00th=[ 1176], 80.00th=[ 1256], 90.00th=[ 1384], 95.00th=[ 1496], | 99.00th=[ 1800], 99.50th=[ 1928], 99.90th=[ 2320], 99.95th=[ 2672], | 99.99th=[ 4704] bw (KB /s): min=205229, max=553181, per=12.50%, avg=233278.26, stdev=18383.51 After ------ job: (groupid=0, jobs=8): err= 0: pid=2220: Tue Jul 19 11:31:51 2016 Description : [Emulation of Storage Server Access Pattern] read : io=57637MB, bw=1921.2MB/s, iops=189529, runt= 30002msec slat (usec): min=169, max=20636, avg=329.61, stdev=124.18 clat (usec): min=2, max=19592, avg=988.78, stdev=251.04 lat (usec): min=381, max=21067, avg=1318.42, stdev=243.58 clat percentiles (usec): | 1.00th=[ 310], 5.00th=[ 580], 10.00th=[ 748], 20.00th=[ 876], | 30.00th=[ 908], 40.00th=[ 948], 50.00th=[ 1012], 60.00th=[ 1064], | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1224], 95.00th=[ 1288], | 99.00th=[ 1496], 99.50th=[ 1608], 99.90th=[ 1960], 99.95th=[ 2256], | 99.99th=[ 5408] bw (KB /s): min=212149, max=390160, per=12.49%, avg=245746.04, stdev=11606.75 Throughput increased from 1822MB/s to 1921MB/s, average completion latencies decreased from 1051us to 988us. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Message-id: 1468931263-32667-4-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13linux-aio: split processing events functionRoman Pen
Prepare processing events function to be called from ioq_submit(), thus split function on two parts: the first harvests completed IO requests, the second submits pending requests. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Message-id: 1468931263-32667-3-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13linux-aio: consume events in userspace instead of calling io_geteventsRoman Pen
AIO context in userspace is represented as a simple ring buffer, which can be consumed directly without entering the kernel, which obviously can bring some performance gain. QEMU does not use timeout value for waiting for events completions, so we can consume all events from userspace. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Message-id: 1468931263-32667-2-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13qcow2: avoid memcpy(dst, NULL, len)Stefan Hajnoczi
Section "7.1.4 Use of library functions" in the C99 standard says: If an argument to a function has an invalid value (such as [...] a null pointer [...]) [...] the behavior is undefined. Additionally the "searching and sorting" functions are specified as requiring valid pointer values as described in 7.1.4. This patch fixes the following sanitizer errors: block/qcow2.c:1807:41: runtime error: null pointer passed as argument 2, which is declared to never be null block/qcow2-cluster.c:86:26: runtime error: null pointer passed as argument 2, which is declared to never be null Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1473758138-19260-1-git-send-email-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-12Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-signed' ↵Peter Maydell
into staging Update OpenBIOS images # gpg: Signature made Mon 12 Sep 2016 11:51:09 BST # gpg: using RSA key 0x5BC2C56FAE0F321F # gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>" # Primary key fingerprint: CC62 1AB9 8E82 200D 915C C9C4 5BC2 C56F AE0F 321F * remotes/mcayland/tags/qemu-openbios-signed: Update OpenBIOS images to c5542f2 built from submodule. Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-12Merge remote-tracking branch ↵Peter Maydell
'remotes/berrange/tags/pull-qcrypto-2016-09-12-1' into staging Merge qcrypto 2016/09/12 v1 # gpg: Signature made Mon 12 Sep 2016 12:02:20 BST # gpg: using RSA key 0xBE86EBB415104FDF # gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>" # gpg: aka "Daniel P. Berrange <berrange@redhat.com>" # Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF * remotes/berrange/tags/pull-qcrypto-2016-09-12-1: crypto: report enum strings instead of values in errors crypto: fix building complaint crypto: ensure XTS is only used with ciphers with 16 byte blocks Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-12crypto: report enum strings instead of values in errorsDaniel P. Berrange
Several error messages print out the raw enum value, which is less than helpful to users, as these values are not documented, nor stable across QEMU releases. Switch to use the enum string instead. The nettle impl also had two typos where it mistakenly said "algorithm" instead of "mode", and actually reported the algorithm value too. Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-09-12crypto: fix building complaintGonglei
gnutls commit 846753877d renamed LIBGNUTLS_VERSION_NUMBER to GNUTLS_VERSION_NUMBER. If using gnutls before that verion, we'll get the below warning: crypto/tlscredsx509.c:618:5: warning: "GNUTLS_VERSION_NUMBER" is not defined Because gnutls 3.x still defines LIBGNUTLS_VERSION_NUMBER for back compat, Let's use LIBGNUTLS_VERSION_NUMBER instead of GNUTLS_VERSION_NUMBER to fix building complaint. Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-09-12crypto: ensure XTS is only used with ciphers with 16 byte blocksDaniel P. Berrange
The XTS cipher mode needs to be used with a cipher which has a block size of 16 bytes. If a mis-matching block size is used, the code will either corrupt memory beyond the IV array, or not fully encrypt/decrypt the IV. This fixes a memory corruption crash when attempting to use cast5-128 with xts, since the former has an 8 byte block size. A test case is added to ensure the cipher creation fails with such an invalid combination. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-09-12Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
virtio,vhost,pc: fixes and updates balloon fixes wrt migration virtio-vsock device support Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Fri 09 Sep 2016 22:36:13 BST # gpg: using RSA key 0x281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: vhost-vsock: add virtio sockets device tests/acpi: speedup acpi tests virtio-pci: minor refactoring vhost: don't set vring call if no vector virtio-pci: error out when both legacy and modern modes are disabled virtio-balloon: fix stats vq migration virtio: add virtqueue_rewind() virtio-balloon: discard virtqueue element on reset virtio: zero vq->inuse in virtio_reset() virtio-pci: reduce modern_mem_bar size target-i386: present virtual L3 cache info for vcpus pc: Add 2.8 machine virtio-pci: use size from correct structure virtio: Tell the user what went wrong when event_notifier_init failed Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-12Update OpenBIOS images to c5542f2 built from submodule.Mark Cave-Ayland
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2016-09-10vhost-vsock: add virtio sockets deviceStefan Hajnoczi
Implement the new virtio sockets device for host<->guest communication using the Sockets API. Most of the work is done in a vhost kernel driver so that virtio-vsock can hook into the AF_VSOCK address family. The QEMU vhost-vsock device handles configuration and live migration while the rx/tx happens in the vhost_vsock.ko Linux kernel driver. The vsock device must be given a CID (host-wide unique address): # qemu -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3 ... For more information see: http://qemu-project.org/Features/VirtioVsock [Endianness fixes and virtio-ccw support by Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> [mst: rebase to master] Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-10tests/acpi: speedup acpi testsMarcel Apfelbaum
Use kvm acceleration if available. Disable kernel-irqchip and use qemu64 cpu for both kvm and tcg cases. Using kvm acceleration saves about a second and disabling kernel-irqchip has no visible performance impact. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09virtio-pci: minor refactoringMichael S. Tsirkin
!legacy && !modern is shorter than !(legacy || modern). I also perfer this (less ()s) as a matter of taste. Cc: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09vhost: don't set vring call if no vectorJason Wang
We used to set vring call fd unconditionally even if guest driver does not use MSIX for this vritqueue at all. This will cause lots of unnecessary userspace access and other checks for drivers does not use interrupt at all (e.g virtio-net pmd). So check and clean vring call fd if guest does not use any vector for this virtqueue at all. Perf diffs (on rx) shows lots of cpus wasted on vhost_signal() were saved: # 28.12% -27.82% [vhost] [k] vhost_signal 14.44% -1.69% [kernel.vmlinux] [k] copy_user_generic_string 7.05% +1.53% [kernel.vmlinux] [k] __free_page_frag 6.51% +5.53% [vhost] [k] vhost_get_vq_desc ... Pktgen tests shows 15.8% improvement on rx pps and 6.5% on tx pps. Before: RX 2.08Mpps TX 1.35Mpps After: RX 2.41Mpps TX 1.44Mpps Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09virtio-pci: error out when both legacy and modern modes are disabledGreg Kurz
Without presuming if we got there because of a user mistake or some more subtle bug in the tooling, it really does not make sense to implement a non-functional device. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09virtio-balloon: fix stats vq migrationLadi Prosek
The statistics virtqueue is not migrated properly because virtio-balloon does not include s->stats_vq_elem in the migration stream. After migration the statistics virtqueue hangs because the host never completes the last element (s->stats_vq_elem is NULL on the destination QEMU). Therefore the guest never submits new elements and the virtqueue is hung. Instead of changing the migration stream format in an incompatible way, detect the migration case and rewind the virtqueue so the last element can be completed. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Suggested-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Ladi Prosek <lprosek@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09virtio: add virtqueue_rewind()Stefan Hajnoczi
virtqueue_discard() requires a VirtQueueElement but virtio-balloon does not migrate its in-use element. Introduce a new function that is similar to virtqueue_discard() but doesn't require a VirtQueueElement. This will allow virtio-balloon to access element again after migration with the usual proviso that the guest may have modified the vring since last time. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Ladi Prosek <lprosek@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09virtio-balloon: discard virtqueue element on resetLadi Prosek
The one pending element is being freed but not discarded on device reset, which causes svq->inuse to creep up, eventually hitting the "Virtqueue size exceeded" error. Properly discarding the element on device reset makes sure that its buffers are unmapped and the inuse counter stays balanced. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Ladi Prosek <lprosek@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09virtio: zero vq->inuse in virtio_reset()Stefan Hajnoczi
vq->inuse must be zeroed upon device reset like most other virtqueue fields. In theory, virtio_reset() just needs assert(vq->inuse == 0) since devices must clean up in-flight requests during reset (requests cannot not be leaked!). In practice, it is difficult to achieve vq->inuse == 0 across reset because balloon, blk, 9p, etc implement various different strategies for cleaning up requests. Most devices call g_free(elem) directly without telling virtio.c that the VirtQueueElement is cleaned up. Therefore vq->inuse is not decremented during reset. This patch zeroes vq->inuse and trusts that devices are not leaking VirtQueueElements across reset. I will send a follow-up series that refactors request life-cycle across all devices and converts vq->inuse = 0 into assert(vq->inuse == 0) but this more invasive approach is not appropriate for stable trees. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Cc: qemu-stable <qemu-stable@nongnu.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Ladi Prosek <lprosek@redhat.com>
2016-09-09virtio-pci: reduce modern_mem_bar sizeMarcel Apfelbaum
Currently each VQ Notification Virtio Capability is allocated on a different page. The idea is to enable split drivers within guests, however there are no known plans to do that. The allocation will result in a 8MB BAR, more than various guest firmwares pre-allocates for PCI Bridges hotplug process. Reserve 4 bytes per VQ by default and add a new parameter "page-per-vq" to be used with split drivers. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09target-i386: present virtual L3 cache info for vcpusLongpeng(Mike)
Some software algorithms are based on the hardware's cache info, for example, for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger a resched IPI and told cpu2 to do the wakeup if they don't share low level cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc. The relevant linux-kernel code as bellow: static void ttwu_queue(struct task_struct *p, int cpu) { struct rq *rq = cpu_rq(cpu); ...... if (... && !cpus_share_cache(smp_processor_id(), cpu)) { ...... ttwu_queue_remote(p, cpu); /* will trigger RES IPI */ return; } ...... ttwu_do_activate(rq, p, 0); /* access target's rq directly */ ...... } In real hardware, the cpus on the same socket share L3 cache, so one won't trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs under some workloads even if the virtual cpus belongs to the same virtual socket. For KVM, there will be lots of vmexit due to guest send IPIs. The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates) and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during the period: No-L3 With-L3(applied this patch) cpu0: 363890 44582 cpu1: 373405 43109 cpu2: 340783 43797 cpu3: 333854 43409 cpu4: 327170 40038 cpu5: 325491 39922 cpu6: 319129 42391 cpu7: 306480 41035 cpu8: 161139 32188 cpu9: 164649 31024 cpu10: 149823 30398 cpu11: 149823 32455 cpu12: 164830 35143 cpu13: 172269 35805 cpu14: 179979 33898 cpu15: 194505 32754 avg: 268963.6 40129.8 The VM's topology is "1*socket 8*cores 2*threads". After present virtual L3 cache info for VM, the amounts of RES IPIs in guest reduce 85%. For KVM, vcpus send IPIs will cause vmexit which is expensive, so it can cause severe performance degradation. We had tested the overall system performance if vcpus actually run on sparate physical socket. With L3 cache, the performance improves 7.2%~33.1%(avg:15.7%). Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09pc: Add 2.8 machineLongpeng(Mike)
This will used by the next patch. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09virtio-pci: use size from correct structureMichael S. Tsirkin
PIO MR registration should use size from the correct notify struct. Doesn't affect any visible behaviour because the field values are the same (both are 4). Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09virtio: Tell the user what went wrong when event_notifier_init failedThomas Huth
event_notifier_init() can fail in real life, for example when there are not enough open file handles available (EMFILE) when using a lot of devices. So instead of leaving the average user with a cryptic error number only, print out a proper error message with strerror() instead, so that the user has a better way to figure out what is going on and that using "ulimit -n" might help here for example. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-09-09Merge remote-tracking branch 'remotes/famz/tags/docker-pull-request' into ↵Peter Maydell
staging # gpg: Signature made Fri 09 Sep 2016 05:54:35 BST # gpg: using RSA key 0xCA35624C6A9171C6 # gpg: Good signature from "Fam Zheng <famz@redhat.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6 * remotes/famz/tags/docker-pull-request: docker: silence debootstrap when --quiet is given docker: build debootstrap after cloning docker: make sure debootstrap is at least 1.0.67 docker: print warning if EXECUTABLE is not set when building debootstrap image docker: debian-bootstrap.pre: print helpful message if DEB_ARCH/DEB_TYPE unset docker: debian-bootstrap.pre: print error messages to stderr docker: avoid dependency on 'realpath' package docker.py: don't hang on large docker output docker: Add a glib2-2.22 image Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-09qtest.c: Allow zero size in memset qtest commandsPeter Maydell
Some tests use the qtest protocol "memset" command with a zero size, expecting it to do nothing. However in the current code this will result in calling memset() with a NULL pointer, which is undefined behaviour. Detect and specially handle zero sizes to avoid this. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1470393800-7882-1-git-send-email-peter.maydell@linaro.org
2016-09-08Merge remote-tracking branch 'remotes/elmarco/tags/leak-pull-request' into ↵Peter Maydell
staging Pull request v2: - dropped "tests: fix small leak in test-io-channel-command" that Daniel Berrange will pick - fixed "tests: add qtest_add_data_func_full" to work with glib < 2.26 # gpg: Signature made Thu 08 Sep 2016 15:16:54 BST # gpg: using RSA key 0xDAE8E10975969CE5 # gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" # gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5 * remotes/elmarco/tags/leak-pull-request: (25 commits) tests: fix postcopy-test leaks tests: fix rsp leak in postcopy-test tests: pc-cpu-test leaks fixes tests: add qtest_add_data_func_full bus: simplify name handling ipmi: free extern timer sd: free timer pc: keep gsi reference pc: free i8259 tests: fix qom-test leaks acpi-build: fix array leak machine: use class base init generated name pc: don't leak a20_line pc: simplify passing qemu_irq portio: keep references on portio tests: fix leak in test-string-input-visitor tests: fix check-qom-proplist leaks tests: fix check-qom-interface leaks tests: fix test-iov leaks tests: fix test-vmstate leaks ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-08tests: fix postcopy-test leaksMarc-André Lureau
A few strings are allocated and never freed. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-09-08tests: fix rsp leak in postcopy-testMarc-André Lureau
In all cases, even when the dict doesn't contain 'ram', the qmp response must be unref. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-09-08tests: pc-cpu-test leaks fixesMarc-André Lureau
The path is allocated and should be freed. The qmp response should be unref, but then 'machine' must be duplicated. Use a destroy function for the PCTestData. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-09-08tests: add qtest_add_data_func_fullMarc-André Lureau
Allows one to specify a destroy function for the test data. Add a fallback using glib g_test_add_vtable() internal function, whose signature changed over time. Tested with glib 2.22, 2.26 and 2.48, which according to git log should be enough to cover all variations. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-09-08bus: simplify name handlingMarc-André Lureau
Simplify a bit the code by using g_strdup_printf() and store it in a non-const value so casting is no longer needed, and ownership is clearer. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-09-08ipmi: free extern timerMarc-André Lureau
Free the timer allocated during instance init. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Corey Minyard <cminyard@mvista.com>
2016-09-08sd: free timerMarc-André Lureau
Free the timer allocated in instance_init. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
2016-09-08pc: keep gsi referenceMarc-André Lureau
Further cleanup would need to call qemu_free_irq() at the appropriate time, but for now this silences ASAN about direct leaks. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
2016-09-08pc: free i8259Marc-André Lureau
Simiarly to 2ba154cf4eb8636cdd3aa90f392ca9e77206ca39 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2016-09-08tests: fix qom-test leaksMarc-André Lureau
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-09-08acpi-build: fix array leakMarc-André Lureau
The free_ranges array is used as a temporary pointer array, the segment should still be freed, however, it shouldn't free the elements themself. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Tested-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2016-09-08machine: use class base init generated nameMarc-André Lureau
machine_class_base_init() member name is allocated by machine_class_base_init(), but not freed by machine_class_finalize(). Simply freeing there doesn't work, because DEFINE_PC_MACHINE() overwrites it with a literal string. Fix DEFINE_PC_MACHINE() not to overwrite it, and add the missing free to machine_class_finalize(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
2016-09-08pc: don't leak a20_lineMarc-André Lureau
The irqs array is no longer being used Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-09-08pc: simplify passing qemu_irqMarc-André Lureau
qemu_irq is already a pointer, no need to have an extra pointer level. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-09-08portio: keep references on portioMarc-André Lureau
The isa_register_portio_list() function allocates ioports data/state. Let's keep the reference to this data on some owner. This isn't enough to fix leaks, but at least, ASAN stops complaining of direct leaks. Further cleanup would require calling portio_list_del/destroy(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>