aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-09-28linux-aio: fix re-entrant completion processingStefan Hajnoczi
Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process completions from ioq_submit()") added an optimization that processes completions each time ioq_submit() returns with requests in flight. This commit introduces a "Co-routine re-entered recursively" error which can be triggered with -drive format=qcow2,aio=native. Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, and I debugged the following backtrace: (gdb) bt #0 0x00007ffff0a046f5 in raise () at /lib64/libc.so.6 #1 0x00007ffff0a062fa in abort () at /lib64/libc.so.6 #2 0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at util/qemu-coroutine.c:113 #3 0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218 #4 0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331 #5 0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, type=type@entry=1) at block/linux-aio.c:383 #6 0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at block/linux-aio.c:402 #7 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2932727808, bytes=bytes@entry=8192, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804 #8 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041 #9 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, flags=flags@entry=0) at block/io.c:1133 #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at block/qcow2.c:1509 #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804 #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, offset=offset@entry=6178725888, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:1041 #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133 #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at block/block-backend.c:783 #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at block/block-backend.c:991 #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78 It turned out that re-entrant ioq_submit() and completion processing between three requests caused this error. The following check is not sufficient to prevent recursively entering coroutines: if (laiocb->co != qemu_coroutine_self()) { qemu_coroutine_enter(laiocb->co); } As the following coroutine backtrace shows, not just the current coroutine (self) can be entered. There might also be other coroutines that are currently entered and transferred control due to the qcow2 lock (CoMutex): (gdb) qemu coroutine 0x5555583464d0 #0 0x0000555555ac0c90 in qemu_coroutine_switch (from_=from_@entry=0x5555583464d0, to_=to_@entry=0x5555572f9890, action=action@entry=COROUTINE_ENTER) at util/coroutine-ucontext.c:175 #1 0x0000555555abfe54 in qemu_coroutine_enter (co=0x5555572f9890) at util/qemu-coroutine.c:117 #2 0x0000555555ac031c in qemu_co_queue_run_restart (co=co@entry=0x5555583462c0) at util/qemu-coroutine-lock.c:60 #3 0x0000555555abfe5e in qemu_coroutine_enter (co=0x5555583462c0) at util/qemu-coroutine.c:119 #4 0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218 #5 0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331 #6 0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x55555a338b40, offset=offset@entry=2911477760, type=type@entry=1) at block/linux-aio.c:383 #7 0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2911477760, qiov=0x55555a338e80, type=1) at block/linux-aio.c:402 #8 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2911477760, bytes=bytes@entry=8192, qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:804 #9 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x55555a338d80, offset=offset@entry=2911477760, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:1041 #10 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2911477760, bytes=8192, qiov=qiov@entry=0x55555a338e80, flags=flags@entry=0) at block/io.c:1133 #11 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=<optimized out>) at block/qcow2.c:1509 #12 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6157475840, bytes=bytes@entry=8192, qiov=qiov@entry=0x5555575df720, flags=0) at block/io.c:804 #13 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x55555a339060, offset=offset@entry=6157475840, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x5555575df720, flags=0) at block/io.c:1041 #14 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6157475840, bytes=bytes@entry=8192, qiov=qiov@entry=0x5555575df720, flags=flags@entry=0) at block/io.c:1133 #15 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=0) at block/block-backend.c:783 #16 0x0000555555a45266 in blk_aio_read_entry (opaque=0x555557231aa0) at block/block-backend.c:991 #17 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78 Use the new qemu_coroutine_entered() function instead of comparing against qemu_coroutine_self(). This is correct because: 1. If a coroutine is not entered then it must have yielded to wait for I/O completion. It is therefore safe to enter. 2. If a coroutine is entered then it must be in ioq_submit()/qemu_laio_process_completions() because otherwise it would be yielded while waiting for I/O completion. Therefore it will check laio->ret and return from ioq_submit() instead of yielding, i.e. it's guaranteed not to hang. Reported-by: Fam Zheng <famz@redhat.com> Tested-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1474989516-18255-4-git-send-email-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-28test-coroutine: test qemu_coroutine_entered()Stefan Hajnoczi
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1474989516-18255-3-git-send-email-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-28coroutine: add qemu_coroutine_entered() functionStefan Hajnoczi
See the doc comments for a description of this new coroutine API. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1474989516-18255-2-git-send-email-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-28libqos: fix qvring_init()Laurent Vivier
"vq->desc[i].addr" is a 64bit value, so write it with writeq(), not writew(). struct vring_desc { __virtio64 addr; __virtio32 len; __virtio16 flags; __virtio16 next; }; Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-id: 1474903450-9605-1-git-send-email-lvivier@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-28iothread: check iothread->ctx before aio_context_unref to avoid assertionLin Ma
if iothread->ctx is set to NULL, aio_context_unref triggers the assertion: g_source_unref: assertion 'source != NULL' failed. The patch fixes it. Signed-off-by: Lin Ma <lma@suse.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20160926052958.10716-1-lma@suse.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-28aio-posix: avoid unnecessary aio_epoll_enabled() callsYaowei Bai
As epoll whether enabled or not is a global setting, we can just check it only once rather than checking it with every node iteration. Through this we can avoid a lot of checks when epoll is not enabled. Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Reviewed-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Message-id: 1473851019-7005-3-git-send-email-baiyaowei@cmss.chinamobile.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-28block: mirror: fix wrong comment of mirror_startYaowei Bai
Obviously, we should write to '@target'. Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Reviewed-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1473851019-7005-2-git-send-email-baiyaowei@cmss.chinamobile.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-27Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into ↵Peter Maydell
staging x86 and machine queue, 2016-09-27 # gpg: Signature made Tue 27 Sep 2016 21:10:06 BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/x86-pull-request: sysbus: Remove ignored return value of FindSysbusDeviceFunc target-i386: Remove has_msr_* global vars for KVM features target-i386: Clear KVM CPUID features if KVM is disabled target-i386: Remove has_msr_hv_tsc global variable target-i386: Remove has_msr_hv_apic global variable target-i386: Remove has_msr_mtrr global variable target-i386: Move xsave component mask to features array target-i386: xsave: Calculate set of xsave components on realize target-i386: xsave: Helper function to calculate xsave area size target-i386: xsave: Simplify CPUID[0xD,0].{EAX,EDX} calculation target-i386: xsave: Calculate enabled components only once target-i386: Don't try to enable PT State xsave component target-i386: Move feature name arrays inside FeatureWordInfo linux-user: remove #define smp_{cores, threads} target-i386: Enable CPUID[0x8000000A] if SVM is enabled target-i386: Automatically set level/xlevel/xlevel2 when needed tests: Test CPUID level handling for old machines tests: Add test code for CPUID level/xlevel handling target-i386: Add a marker to end of the region zeroed on reset target-i386: Remove unused X86CPUDefinition::xlevel2 field Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-27sysbus: Remove ignored return value of FindSysbusDeviceFuncDavid Gibson
Functions of type FindSysbusDeviceFunc currently return an integer. However, this return value is always ignored by the caller in find_sysbus_device(). This changes the function type to return void, to avoid confusion over the function semantics. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: Remove has_msr_* global vars for KVM featuresEduardo Habkost
The global variables are not necessary because we can check KVM feature flags in X86CPU directly. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: Clear KVM CPUID features if KVM is disabledEduardo Habkost
This will ensure all checks for features[FEAT_KVM] in the code will be correct in case the KVM CPUID leaf is completely disabled. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: Remove has_msr_hv_tsc global variableEduardo Habkost
The global variable is not necessary because we can check cpu->hyperv_time directly. We just need to ensure cpu->hyperv_time will be cleared if the feature is not really being exposed to the guest due to missing KVM_CAP_HYPERV_TIME capability. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: Remove has_msr_hv_apic global variableEduardo Habkost
The global variable is not necessary because we can check cpu->hyperv_vapic directly. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: Remove has_msr_mtrr global variableEduardo Habkost
The global variable is not necessary because we can check the CPU feature flags directly. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: Move xsave component mask to features arrayEduardo Habkost
This will reuse the existing check/enforce logic in x86_cpu_filter_features() to check the xsave component bits against GET_SUPPORTED_CPUID. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: xsave: Calculate set of xsave components on realizeEduardo Habkost
Instead of doing complex calculations and calling kvm_arch_get_supported_cpuid() inside cpu_x86_cpuid(), calculate the set of required XSAVE components earlier, at realize time. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: xsave: Helper function to calculate xsave area sizeEduardo Habkost
Move the xsave area size calculation from cpu_x86_cpuid() inside its own function. While doing it, change it to use the XSAVE area struct sizes for the initial size, instead of the magic 0x240 number. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: xsave: Simplify CPUID[0xD,0].{EAX,EDX} calculationEduardo Habkost
Instead of assigning individual bits in a loop, just copy the values from ena_mask. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: xsave: Calculate enabled components only onceEduardo Habkost
Instead of checking both env->features and ena_mask at two different places in the CPUID code, initialize ena_mask based on the features that are enabled for the CPU, and then clear unsupported bits based on kvm_arch_get_supported_cpuid(). The results should be exactly the same, but it will make it easier to move the mask calculation elsewhare, and reuse x86_cpu_filter_features() for the kvm_arch_get_supported_cpuid() check. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: Don't try to enable PT State xsave componentEduardo Habkost
The code that calculates the set of supported XSAVE components on CPUID looks at ext_save_areas to find out which components should be enabled. However, if there are zeroed entries in the ext_save_areas array, the ((env->features[esa->feature] & esa->bits) == esa->bits) check will always succeed and QEMU will unconditionally try to enable the component. Luckily this never caused any problems because the only missing entry in ext_save_areas is the PT State component (bit 8), and KVM currently doesn't support it (so it was cleared on ena_mask). But the code was still incorrect and would break if KVM starts returning CPUID[EAX=0xD,ECX=0].EAX[bit 8] as supported on GET_SUPPORTED_CPUID. Fix the problem by changing the code to not enable a XSAVE component if ExtSaveArea::bits is zero. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: Move feature name arrays inside FeatureWordInfoEduardo Habkost
It makes it easier to guarantee the arrays are the right size, and to find information when looking at the code. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27linux-user: remove #define smp_{cores, threads}Marc-André Lureau
Those are unneeded now that CPUState nr_{cores,threads} is always initialized. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: Enable CPUID[0x8000000A] if SVM is enabledEduardo Habkost
SVM needs CPUID[0x8000000A] to be available. So if SVM is enabled in a CPU model or explicitly in the command-line, adjust CPUID xlevel to expose the CPUID[0x8000000A] leaf. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: Automatically set level/xlevel/xlevel2 when neededEduardo Habkost
Instead of requiring users and management software to be aware of required CPUID level/xlevel/xlevel2 values for each feature, automatically increase those values when features need them. This was already done for CPUID[7].EBX, and is now made generic for all CPUID feature flags. Unit test included, to make sure we don't break ABI on older machine-types and don't mess with the CPUID level values if they are explicitly set by the user. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27tests: Test CPUID level handling for old machinesEduardo Habkost
We're going to change the way level/xlevel/xlevel2 are handled when enabling features, but we need to keep the old behavior on existing machine types. Add test cases for that. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27tests: Add test code for CPUID level/xlevel handlingEduardo Habkost
Add test code that will check if the automatic CPUID level changes are working as expected. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: Add a marker to end of the region zeroed on resetEduardo Habkost
Instead of using cpuid_level, use an empty struct as a marker (like we already did with {start,end}_init_save). This will avoid accidentaly resetting the wrong fields if we change the field ordering on CPUX86State. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27target-i386: Remove unused X86CPUDefinition::xlevel2 fieldEduardo Habkost
No CPU model in builtin_x86_defs has xlevel2 set, so it is always zero. Delete the field. Note that this is not an user-visible change. It doesn't remove the ability to set xlevel2 on the command-line, it just removes an unused field in builtin_x86_defs. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-09-27Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into ↵Peter Maydell
staging # gpg: Signature made Tue 27 Sep 2016 11:05:56 BST # gpg: using RSA key 0xEF04965B398D6211 # gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211 * remotes/jasowang/tags/net-pull-request: (27 commits) imx_fec: fix error in qemu_send_packet argument mcf_fec: fix error in qemu_send_packet argument net: mcf: limit buffer descriptor count e1000e: Fix EIAC register implementation e1000e: Fix spurious RX TCP ACK interrupts e1000e: Fix OTHER interrupts processing for MSI-X e1000e: Fix PBACLR implementation e1000e: Fix CTRL_EXT.EIAME behavior e1000e: Flush receive queues on link up e1000e: Flush all receive queues on receive enable net: limit allocation in nc_sendv_compat tap: Allow specifying a bridge e1000: fix buliding complaint docs: Add documentation for COLO-proxy MAINTAINERS: add maintainer for COLO-proxy filter-rewriter: rewrite tcp packet to keep secondary connection filter-rewriter: track connection and parse packet filter-rewriter: introduce filter-rewriter initialization colo-compare: add TCP, UDP, ICMP packet comparison colo-compare: introduce packet comparison thread ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-27imx_fec: fix error in qemu_send_packet argumentPaolo Bonzini
This uses the wrong frame size for packets composed of multiple descriptors. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27mcf_fec: fix error in qemu_send_packet argumentPaolo Bonzini
This uses the wrong frame size for packets composed of multiple descriptors. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27net: mcf: limit buffer descriptor countPrasad J Pandit
ColdFire Fast Ethernet Controller uses buffer descriptors to manage data flow to/fro receive & transmit queues. While transmitting packets, it could continue to read buffer descriptors if a buffer descriptor has length of zero and has crafted values in bd.flags. Set upper limit to number of buffer descriptors. Reported-by: Li Qiang <liqiang6-s@360.cn> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27e1000e: Fix EIAC register implementationDmitry Fleytman
This patch fixes 2 issues: 1. Bits set in EIAC register should be cleared from IMS when EIAM is not used. 2. Only bit that corresonds to the interrupt being raised should be cleared. See spec. 10.2.4.7 Interrupt Auto Clear Signed-off-by: Dmitry Fleytman <dmitry@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27e1000e: Fix spurious RX TCP ACK interruptsDmitry Fleytman
Do not raise ACK interrupts when RFCTL.ACKDIS bit is set (see spec. 10.2.5.16). Signed-off-by: Dmitry Fleytman <dmitry@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27e1000e: Fix OTHER interrupts processing for MSI-XDmitry Fleytman
Interrupt mask for legacy OTHER causes should not apply to MSI-X OTHER cause. Signed-off-by: Dmitry Fleytman <dmitry@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27e1000e: Fix PBACLR implementationDmitry Fleytman
This patch fixes incorrect check for interrypt type being used. PBSCLR register is valid for MSI-X only. See spec. 10.2.3.13 MSI—X PBA Clear Signed-off-by: Dmitry Fleytman <dmitry@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27e1000e: Fix CTRL_EXT.EIAME behaviorDmitry Fleytman
CTRL_EXT.EIAME bit controls clearing of IAM bits, but current code clears IMS bits instead. See spec. 10.2.2.5 Extended Device Control Register. Signed-off-by: Dmitry Fleytman <dmitry@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27e1000e: Flush receive queues on link upDmitry Fleytman
Signed-off-by: Dmitry Fleytman <dmitry@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27e1000e: Flush all receive queues on receive enableDmitry Fleytman
Before this patch first netdev queue only was flushed. Signed-off-by: Dmitry Fleytman <dmitry@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27net: limit allocation in nc_sendv_compatPeter Lieven
we only need to allocate enough memory to hold the packet. This might be less than NET_BUFSIZE. Additionally fail early if the packet is larger than NET_BUFSIZE. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27tap: Allow specifying a bridgeAlexey Kardashevskiy
The tap backend is already using qemu-bridge-helper to attach tap interface to a bridge but (unlike the bridge backend) it always uses the default bridge name - br0. This adds a "br" property support to the tap backend. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27e1000: fix buliding complaintGonglei
hw/net/e1000e_core.c:56: warning: e1000e_set_interrupt_cause declared inline after being called hw/net/e1000e_core.c:56: warning: previous declaration of e1000e_set_interrupt_cause was here Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Dmitry Fleytman <dmitry@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27docs: Add documentation for COLO-proxyZhang Chen
Introduce the design of COLO-proxy, and how to use it. Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27MAINTAINERS: add maintainer for COLO-proxyZhang Chen
add Zhang Chen and Li zhijian as co-maintainers of COLO-proxy. Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27filter-rewriter: rewrite tcp packet to keep secondary connectionZhang Chen
We will rewrite tcp packet secondary received and sent. When colo guest is a tcp server. Firstly, client start a tcp handshake. the packet's seq=client_seq, ack=0,flag=SYN. COLO primary guest get this pkt and mirror(filter-mirror) to secondary guest, secondary get it use filter-redirector. Then,primary guest response pkt (seq=primary_seq,ack=client_seq+1,flag=ACK|SYN). secondary guest response pkt (seq=secondary_seq,ack=client_seq+1,flag=ACK|SYN). In here,we use filter-rewriter save the secondary_seq to it's tcp connection. Finally handshake,client send pkt (seq=client_seq+1,ack=primary_seq+1,flag=ACK). Here,filter-rewriter can get primary_seq, and rewrite ack from primary_seq+1 to secondary_seq+1, recalculate checksum. So the secondary tcp connection kept good. When we send/recv packet. client send pkt(seq=client_seq+1+data_len,ack=primary_seq+1,flag=ACK|PSH). filter-rewriter rewrite ack and send to secondary guest. primary guest response pkt (seq=primary_seq+1,ack=client_seq+1+data_len,flag=ACK) secondary guest response pkt (seq=secondary_seq+1,ack=client_seq+1+data_len,flag=ACK) we rewrite secondary guest seq from secondary_seq+1 to primary_seq+1. So tcp connection kept good. In code We use offset( = secondary_seq - primary_seq ) to rewrite seq or ack. handle_primary_tcp_pkt: tcp_pkt->th_ack += offset; handle_secondary_tcp_pkt: tcp_pkt->th_seq -= offset; Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27filter-rewriter: track connection and parse packetZhang Chen
We use net/colo.h to track connection and parse packet Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27filter-rewriter: introduce filter-rewriter initializationZhang Chen
Filter-rewriter is a part of COLO project. It will rewrite some of secondary packet to make secondary guest's tcp connection established successfully. In this module we will rewrite tcp packet's ack to the secondary from primary,and rewrite tcp packet's seq to the primary from secondary. usage: colo secondary: -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 -object filter-rewriter,id=rew0,netdev=hn0,queue=all Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27colo-compare: add TCP, UDP, ICMP packet comparisonZhang Chen
We add TCP,UDP,ICMP packet comparison to replace IP packet comparison. This can increase the accuracy of the package comparison. Less checkpoint more efficiency. Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27colo-compare: introduce packet comparison threadZhang Chen
If primary packet is same with secondary packet, we will send primary packet and drop secondary packet, otherwise notify COLO frame to do checkpoint. If primary packet comes but secondary packet does not, after REGULAR_PACKET_CHECK_MS milliseconds we set the primary packet as old_packet,then do a checkpoint. Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-09-27colo-compare: track connection and enqueue packetZhang Chen
In this patch we use kernel jhash table to track connection, and then enqueue net packet like this: + CompareState ++ | | +---------------+ +---------------+ +---------------+ |conn list +--->conn +--------->conn | +---------------+ +---------------+ +---------------+ | | | | | | +---------------+ +---v----+ +---v----+ +---v----+ +---v----+ |primary | |secondary |primary | |secondary |packet | |packet + |packet | |packet + +--------+ +--------+ +--------+ +--------+ | | | | +---v----+ +---v----+ +---v----+ +---v----+ |primary | |secondary |primary | |secondary |packet | |packet + |packet | |packet + +--------+ +--------+ +--------+ +--------+ | | | | +---v----+ +---v----+ +---v----+ +---v----+ |primary | |secondary |primary | |secondary |packet | |packet + |packet | |packet + +--------+ +--------+ +--------+ +--------+ We use conn_list to record connection info. When we want to enqueue a packet, firstly get the connection from connection_track_table. then push the packet to g_queue(pri/sec) in it's own conn. Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com>