aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-04-20memory: introduce memory_region_notify_one()Peter Xu
Generalizing the notify logic in memory_region_notify_iommu() into a single function. This can be further used in customized replay() functions for IOMMUs. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-5-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20memory: provide iommu_replay_all()Peter Xu
This is an "global" version of existing memory_region_iommu_replay() - we announce the translations to all the registered notifiers, instead of a specific one. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-4-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20memory: provide IOMMU_NOTIFIER_FOREACH macroPeter Xu
A new macro is provided to iterate all the IOMMU notifiers hooked under specific IOMMU memory region. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-3-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20memory: add section range info for IOMMU notifierPeter Xu
In this patch, IOMMUNotifier.{start|end} are introduced to store section information for a specific notifier. When notification occurs, we not only check the notification type (MAP|UNMAP), but also check whether the notified iova range overlaps with the range of specific IOMMU notifier, and skip those notifiers if not in the listened range. When removing an region, we need to make sure we removed the correct VFIOGuestIOMMU by checking the IOMMUNotifier.start address as well. This patch is solving the problem that vfio-pci devices receive duplicated UNMAP notification on x86 platform when vIOMMU is there. The issue is that x86 IOMMU has a (0, 2^64-1) IOMMU region, which is splitted by the (0xfee00000, 0xfeefffff) IRQ region. AFAIK this (splitted IOMMU region) is only happening on x86. This patch also helps vhost to leverage the new interface as well, so that vhost won't get duplicated cache flushes. In that sense, it's an slight performance improvement. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-2-git-send-email-peterx@redhat.com> [ehabkost: included extra vhost_iommu_region_del() change from Peter Xu] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-04-20Merge remote-tracking branch ↵Peter Maydell
'remotes/pmaydell/tags/pull-target-arm-20170420' into staging target-arm queue: * implement M profile exception return properly * cadence GEM: fix multiqueue handling bugs * pxa2xx.c: QOMify a device * arm/kvm: Remove trailing newlines from error_report() * stellaris: Don't hw_error() on bad register accesses * Add assertion about FSC format for syndrome registers * Move excnames[] array into arm_log_exceptions() * exynos: minor code cleanups * hw/arm/boot: take Linux/arm64 TEXT_OFFSET header field into account * Fix APSR writes via M profile MSR # gpg: Signature made Thu 20 Apr 2017 17:39:35 BST # gpg: using RSA key 0x3C2525ED14360CDE # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" # gpg: aka "Peter Maydell <pmaydell@gmail.com>" # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * remotes/pmaydell/tags/pull-target-arm-20170420: (24 commits) arm: Remove workarounds for old M-profile exception return implementation arm: Implement M profile exception return properly arm: Track M profile handler mode state in TB flags arm: Abstract out "are we singlestepping" test to utility function arm: Move condition-failed codepath generation out of if() arm: Move gen_set_condexec() and gen_set_pc_im() up in the file arm: Factor out "generate right kind of step exception" arm: Thumb shift operations should not permit interworking branches arm: Don't implement BXJ on M-profile CPUs xlnx-zynqmp: Set the Cadence GEM revision cadence_gem: Make the revision a property cadence_gem: Correct the interupt logic cadence_gem: Correct the multi-queue can rx logic cadence_gem: Read the correct queue descriptor hw/arm: Qomify pxa2xx.c arm/kvm: Remove trailing newlines from error_report() stellaris: Don't hw_error() on bad register accesses target/arm: Add assertion about FSC format for syndrome registers arm: Move excnames[] array into arm_log_exceptions() target/arm: Add missing entries to excnames[] for log strings ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-20arm: Remove workarounds for old M-profile exception return implementationPeter Maydell
Now that we've rewritten M-profile exception return so that the magic PC values are not visible to other parts of QEMU, we can delete the special casing of them elsewhere. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 1491844419-12485-10-git-send-email-peter.maydell@linaro.org
2017-04-20arm: Implement M profile exception return properlyPeter Maydell
On M profile, return from exceptions happen when code in Handler mode executes one of the following function call return instructions: * POP or LDM which loads the PC * LDR to PC * BX register and the new PC value is 0xFFxxxxxx. QEMU tries to implement this by not treating the instruction specially but then catching the attempt to execute from the magic address value. This is not ideal, because: * there are guest visible differences from the architecturally specified behaviour (for instance jumping to 0xFFxxxxxx via a different instruction should not cause an exception return but it will in the QEMU implementation) * we have to account for it in various places (like refusing to take an interrupt if the PC is at a magic value, and making sure that the MPU doesn't deny execution at the magic value addresses) Drop these hacks, and instead implement exception return the way the architecture specifies -- by having the relevant instructions check for the magic value and raise the 'do an exception return' QEMU internal exception immediately. The effect on the generated code is minor: bx lr, old code (and new code for Thread mode): TCG: mov_i32 tmp5,r14 movi_i32 tmp6,$0xfffffffffffffffe and_i32 pc,tmp5,tmp6 movi_i32 tmp6,$0x1 and_i32 tmp5,tmp5,tmp6 st_i32 tmp5,env,$0x218 exit_tb $0x0 set_label $L0 exit_tb $0x7f2aabd61993 x86_64 generated code: 0x7f2aabe87019: mov %ebx,%ebp 0x7f2aabe8701b: and $0xfffffffffffffffe,%ebp 0x7f2aabe8701e: mov %ebp,0x3c(%r14) 0x7f2aabe87022: and $0x1,%ebx 0x7f2aabe87025: mov %ebx,0x218(%r14) 0x7f2aabe8702c: xor %eax,%eax 0x7f2aabe8702e: jmpq 0x7f2aabe7c016 bx lr, new code when in Handler mode: TCG: mov_i32 tmp5,r14 movi_i32 tmp6,$0xfffffffffffffffe and_i32 pc,tmp5,tmp6 movi_i32 tmp6,$0x1 and_i32 tmp5,tmp5,tmp6 st_i32 tmp5,env,$0x218 movi_i32 tmp5,$0xffffffffff000000 brcond_i32 pc,tmp5,geu,$L1 exit_tb $0x0 set_label $L1 movi_i32 tmp5,$0x8 call exception_internal,$0x0,$0,env,tmp5 x86_64 generated code: 0x7fe8fa1264e3: mov %ebp,%ebx 0x7fe8fa1264e5: and $0xfffffffffffffffe,%ebx 0x7fe8fa1264e8: mov %ebx,0x3c(%r14) 0x7fe8fa1264ec: and $0x1,%ebp 0x7fe8fa1264ef: mov %ebp,0x218(%r14) 0x7fe8fa1264f6: cmp $0xff000000,%ebx 0x7fe8fa1264fc: jae 0x7fe8fa126509 0x7fe8fa126502: xor %eax,%eax 0x7fe8fa126504: jmpq 0x7fe8fa122016 0x7fe8fa126509: mov %r14,%rdi 0x7fe8fa12650c: mov $0x8,%esi 0x7fe8fa126511: mov $0x56095dbeccf5,%r10 0x7fe8fa12651b: callq *%r10 which is a difference of one cmp/branch-not-taken. This will be lost in the noise of having to exit generated code and look up the next TB anyway. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 1491844419-12485-9-git-send-email-peter.maydell@linaro.org
2017-04-20arm: Track M profile handler mode state in TB flagsPeter Maydell
For M profile exception-return handling we'd like to generate different code for some instructions depending on whether we are in Handler mode or Thread mode. This isn't the same as "are we privileged or user", so we need an extra bit in the TB flags to distinguish. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 1491844419-12485-8-git-send-email-peter.maydell@linaro.org
2017-04-20arm: Abstract out "are we singlestepping" test to utility functionPeter Maydell
We now test for "are we singlestepping" in several places and it's not a trivial check because we need to care about both architectural singlestep and QEMU gdbstub singlestep. We're also about to add another place that needs to make this check, so pull the condition out into a function. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 1491844419-12485-7-git-send-email-peter.maydell@linaro.org
2017-04-20arm: Move condition-failed codepath generation out of if()Peter Maydell
Move the code to generate the "condition failed" instruction codepath out of the if (singlestepping) {} else {}. This will allow adding support for handling a new is_jmp type which can't be neatly split into "singlestepping case" versus "not singlestepping case". Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 1491844419-12485-6-git-send-email-peter.maydell@linaro.org
2017-04-20arm: Move gen_set_condexec() and gen_set_pc_im() up in the filePeter Maydell
Move the utility routines gen_set_condexec() and gen_set_pc_im() up in the file, as we will want to use them from a function placed earlier in the file than their current location. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 1491844419-12485-5-git-send-email-peter.maydell@linaro.org
2017-04-20arm: Factor out "generate right kind of step exception"Peter Maydell
We currently have two places that do: if (dc->ss_active) { gen_step_complete_exception(dc); } else { gen_exception_internal(EXCP_DEBUG); } Factor this out into its own function, as we're about to add a third place that needs the same logic. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 1491844419-12485-4-git-send-email-peter.maydell@linaro.org
2017-04-20arm: Thumb shift operations should not permit interworking branchesPeter Maydell
In Thumb mode, the only instructions which can cause an interworking branch by writing the PC are BLX, BX, BXJ, LDR, POP and LDM. Unlike ARM mode, data processing instructions which target the PC do not cause interworking branches. When we added support for doing interworking branches on writes to PC from data processing instructions in commit 21aeb3430ce7ba, we accidentally changed a Thumb instruction to have interworking branch behaviour for writes to PC. (MOV, MOVS register-shifted register, encoding T2; this is the standard encoding for LSL/LSR/ASR/ROR (register).) For this encoding, behaviour with Rd == R15 is specified as UNPREDICTABLE, so allowing an interworking branch is within spec, but it's confusing and differs from our handling of this class of UNPREDICTABLE for other Thumb ALU operations. Make it perform a simple (non-interworking) branch like the others. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 1491844419-12485-3-git-send-email-peter.maydell@linaro.org
2017-04-20arm: Don't implement BXJ on M-profile CPUsPeter Maydell
For M-profile CPUs, the BXJ instruction does not exist at all, and the encoding should always UNDEF. We were accidentally implementing it to behave like A-profile BXJ; correct the error. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 1491844419-12485-2-git-send-email-peter.maydell@linaro.org
2017-04-20xlnx-zynqmp: Set the Cadence GEM revisionAlistair Francis
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 026dbe01a1d42619eee30ce3f2079741bf04bc73.1491947224.git.alistair.francis@xilinx.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-20cadence_gem: Make the revision a propertyAlistair Francis
Expose the Cadence GEM revision as a property. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 541324373cf87b50f8be0439a0cb89f5028b016f.1491947224.git.alistair.francis@xilinx.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-20cadence_gem: Correct the interupt logicAlistair Francis
This patch fixes two mistakes in the interrupt logic. First we only trigger single-queue or multi-queue interrupts if the status register is set. This logic was already used for non multi-queue interrupts but it also applies to multi-queue interrupts. Secondly we need to lower the interrupts if the ISR isn't set. As part of this we can remove the other interrupt lowering logic and consolidate it inside gem_update_int_status(). Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Message-id: 438bcc014f8f8a2f8f68f322cb6a53f4c04688c2.1491947224.git.alistair.francis@xilinx.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-20cadence_gem: Correct the multi-queue can rx logicAlistair Francis
Correct the buffer descriptor busy logic to work correctly when using multiple queues. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Message-id: 8a7e8059984e27d46a276a66299d035a0afd280f.1491947224.git.alistair.francis@xilinx.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-20cadence_gem: Read the correct queue descriptorAlistair Francis
Read the correct descriptor instead of hardcoding the first (q=0). Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 988b183dcf951856d8b3379f7e911ec95233bbf4.1491947224.git.alistair.francis@xilinx.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-20hw/arm: Qomify pxa2xx.cSuramya Shah
Signed-off-by: Suramya Shah <shah.suramya@gmail.com> Message-id: 20170415180316.2694-1-shah.suramya@gmail.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-20arm/kvm: Remove trailing newlines from error_report()Ishani Chugh
Signed-off-by: Ishani Chugh <chugh.ishani@research.iiit.ac.in> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1491629987-6826-1-git-send-email-chugh.ishani@research.iiit.ac.in Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-20stellaris: Don't hw_error() on bad register accessesPeter Maydell
Current recommended style is to log a guest error on bad register accesses, not kill the whole system with hw_error(). Change the hw_error() calls to log as LOG_GUEST_ERROR or LOG_UNIMP or use g_assert_not_reached() as appropriate. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 1491486314-25823-1-git-send-email-peter.maydell@linaro.org
2017-04-20target/arm: Add assertion about FSC format for syndrome registersPeter Maydell
In tlb_fill() we construct a syndrome register value from a fault status register value which is filled in by arm_tlb_fill(). arm_tlb_fill() returns FSR values which might be in the format used with short-format page descriptors, or the format used with long-format (LPAE) descriptors. The syndrome register always uses LPAE-format FSR status codes. It isn't actually possible to end up delivering a syndrome register value to the guest for a fault which is reported with a short-format FSR (that kind of stage 1 fault will only happen for an AArch32 translation regime which doesn't have a syndrome register, and can never be redirected to an AArch64 or Hyp exception level). Add an assertion which checks this, and adjust the code so that we construct a syndrome with an invalid status code, rather than allowing set bits in the FSR input to randomly corrupt other fields in the syndrome. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Message-id: 1491486152-24304-1-git-send-email-peter.maydell@linaro.org
2017-04-20arm: Move excnames[] array into arm_log_exceptions()Peter Maydell
The excnames[] array is defined in internals.h because we used to use it from two different source files for handling logging of AArch32 and AArch64 exception entry. Refactoring means that it's now used only in arm_log_exception() in helper.c, so move the array into that function. Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 1491821097-5647-1-git-send-email-peter.maydell@linaro.org
2017-04-20target/arm: Add missing entries to excnames[] for log stringsPeter Maydell
Recent changes have added new EXCP_ values to ARM but forgot to update the excnames[] array which is used to provide human-readable strings when printing information about the exception for debug logging. Add the missing entries, and add a comment to the list of #defines to help avoid the mistake being repeated in future. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Message-id: 1491486340-25988-1-git-send-email-peter.maydell@linaro.org
2017-04-20hw/misc/exynos4210_pmu: Reorder local variables for readabilityKrzysztof Kozlowski
Short declaration of 'i' was in the middle of declarations with assignments. Make it a little bit more readable. Additionally switch from "unsigned" to "unsigned int" as this pattern is more widely used. No functional change. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20170313184750.429-4-krzk@kernel.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-20hw/char/exynos4210_uart: Constify static array and few argumentsKrzysztof Kozlowski
The static array exynos4210_uart_regs with register values is not modified so it can be made const. Few other functions accept driver or uart state as an argument but they do not change it and do not cast it so this can be made const for code safeness. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Message-id: 20170313184750.429-3-krzk@kernel.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-20hw/arm/exynos: Convert fprintf to qemu_log_mask/error_reportKrzysztof Kozlowski
qemu_log_mask() and error_report() are preferred over fprintf() for logging errors. Also remove square brackets [] and additional new line characters in printed messages. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20170313184750.429-2-krzk@kernel.org [PMM: wrapped long line] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-20hw/arm/boot: take Linux/arm64 TEXT_OFFSET header field into accountArd Biesheuvel
The arm64 boot protocol stipulates that the kernel must be loaded TEXT_OFFSET bytes beyond a 2 MB aligned base address, where TEXT_OFFSET could be any 4 KB multiple between 0 and 2 MB, and whose value can be found in the header of the Image file. So after attempts to load the arm64 kernel image as an ELF file or as a U-Boot image have failed (both of which have their own way of specifying the load offset), try to determine the TEXT_OFFSET from the image after loading it but before mapping it as a ROM mapping into the guest address space. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1489414630-21609-1-git-send-email-ard.biesheuvel@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-20Open 2.10 development treePeter Maydell
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-20Update version for v2.9.0 releasev2.9.0Peter Maydell
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-18Update version for v2.9.0-rc5 releasev2.9.0-rc5Peter Maydell
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-18Merge remote-tracking branch 'remotes/famz/tags/block-pull-request' into stagingPeter Maydell
# gpg: Signature made Tue 18 Apr 2017 15:58:32 BST # gpg: using RSA key 0xCA35624C6A9171C6 # gpg: Good signature from "Fam Zheng <famz@redhat.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6 * remotes/famz/tags/block-pull-request: block: Drain BH in bdrv_drained_begin block: Walk bs->children carefully in bdrv_drain_recurse Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-18block: Drain BH in bdrv_drained_beginFam Zheng
During block job completion, nothing is preventing block_job_defer_to_main_loop_bh from being called in a nested aio_poll(), which is a trouble, such as in this code path: qmp_block_commit commit_active_start bdrv_reopen bdrv_reopen_multiple bdrv_reopen_prepare bdrv_flush aio_poll aio_bh_poll aio_bh_call block_job_defer_to_main_loop_bh stream_complete bdrv_reopen block_job_defer_to_main_loop_bh is the last step of the stream job, which should have been "paused" by the bdrv_drained_begin/end in bdrv_reopen_multiple, but it is not done because it's in the form of a main loop BH. Similar to why block jobs should be paused between drained_begin and drained_end, BHs they schedule must be excluded as well. To achieve this, this patch forces draining the BH in BDRV_POLL_WHILE. As a side effect this fixes a hang in block_job_detach_aio_context during system_reset when a block job is ready: #0 0x0000555555aa79f3 in bdrv_drain_recurse #1 0x0000555555aa825d in bdrv_drained_begin #2 0x0000555555aa8449 in bdrv_drain #3 0x0000555555a9c356 in blk_drain #4 0x0000555555aa3cfd in mirror_drain #5 0x0000555555a66e11 in block_job_detach_aio_context #6 0x0000555555a62f4d in bdrv_detach_aio_context #7 0x0000555555a63116 in bdrv_set_aio_context #8 0x0000555555a9d326 in blk_set_aio_context #9 0x00005555557e38da in virtio_blk_data_plane_stop #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd #13 0x00005555559f6a18 in virtio_pci_reset #14 0x00005555559139a9 in qdev_reset_one #15 0x0000555555916738 in qbus_walk_children #16 0x0000555555913318 in qdev_walk_children #17 0x0000555555916738 in qbus_walk_children #18 0x00005555559168ca in qemu_devices_reset #19 0x000055555581fcbb in pc_machine_reset #20 0x00005555558a4d96 in qemu_system_reset #21 0x000055555577157a in main_loop_should_exit #22 0x000055555577157a in main_loop #23 0x000055555577157a in main The rationale is that the loop in block_job_detach_aio_context cannot make any progress in pausing/completing the job, because bs->in_flight is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop BH. With this patch, it does. Reported-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170418143044.12187-3-famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Tested-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2017-04-18block: Walk bs->children carefully in bdrv_drain_recurseFam Zheng
The recursive bdrv_drain_recurse may run a block job completion BH that drops nodes. The coming changes will make that more likely and use-after-free would happen without this patch Stash the bs pointer and use bdrv_ref/bdrv_unref in addition to QLIST_FOREACH_SAFE to prevent such a case from happening. Since bdrv_unref accesses global state that is not protected by the AioContext lock, we cannot use bdrv_ref/bdrv_unref unconditionally. Fortunately the protection is not needed in IOThread because only main loop can modify a graph with the AioContext lock held. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170418143044.12187-2-famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Tested-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2017-04-189pfs: local: set the path of the export root to "."Greg Kurz
The local backend was recently converted to using "at*()" syscalls in order to ensure all accesses happen below the shared directory. This requires that we only pass relative paths, otherwise the dirfd argument to the "at*()" syscalls is ignored and the path is treated as an absolute path in the host. This is actually the case for paths in all fids, with the notable exception of the root fid, whose path is "/". This causes the following backend ops to act on the "/" directory of the host instead of the virtfs shared directory when the export root is involved: - lstat - chmod - chown - utimensat ie, chmod /9p_mount_point in the guest will be converted to chmod / in the host for example. This could cause security issues with a privileged QEMU. All "*at()" syscalls are being passed an open file descriptor. In the case of the export root, this file descriptor points to the path in the host that was passed to -fsdev. The fix is thus as simple as changing the path of the export root fid to be "." instead of "/". This is CVE-2017-7471. Cc: qemu-stable@nongnu.org Reported-by: Léo Gaspard <leo@gaspard.io> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-11Update version for v2.9.0-rc4 releasev2.9.0-rc4Peter Maydell
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-11block/io: Comment out permission assertionsMax Reitz
In case of block migration, there may be writes to BlockBackends that do not have the write permission taken. Before this issue is fixed (which is not going to happen in 2.9), we therefore cannot assert that this is the case. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20170411145050.31290-1-mreitz@redhat.com Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-11sheepdog: Fix crash in co_read_response()Kevin Wolf
This fixes a regression introduced in commit 9d456654. aio_co_wake() can only be used to reenter a coroutine that was already previously entered, otherwise co->ctx is uninitialised and we access garbage. Using it immediately after qemu_coroutine_create() like in co_read_response() is wrong and causes segfaults. Replace the call with aio_co_enter(), which gets an explicit AioContext parameter and works even for new coroutines. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kashyap Chamarthy <kchamart@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1491919733-21065-1-git-send-email-kwolf@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-11Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2017-04-11' ↵Peter Maydell
into staging Block patches for 2.9.0-rc4 # gpg: Signature made Tue 11 Apr 2017 14:40:07 BST # gpg: using RSA key 0xF407DB0061D5CF40 # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2017-04-11: iscsi: Fix iscsi_create throttle: Remove block from group on hot-unplug block: pass the right options for BlockDriver.bdrv_open() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-11iscsi: Fix iscsi_createFam Zheng
Since d5895fcb (iscsi: Split URL into individual options), creating qcow2 image on an iscsi LUN fails: qemu-img create -f qcow2 iscsi://$SERVER/$IQN/0 1G qemu-img: iscsi://$SERVER/$IQN/0: Could not create image: Invalid argument The problem is iscsi_open now expects that transport_name, portal and target are already parsed into structured options by iscsi_parse_filename, but it is not called in iscsi_create. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20170410075451.21329-1-famz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> [mreitz: Dropped now superfluous qdict_put(bs_options, "filename", ...)] Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-04-11throttle: Remove block from group on hot-unplugEric Blake
When a block device that is part of a throttle group is hot-unplugged, we forgot to remove it from the throttle group. This leaves stale memory around, and causes an easily reproducible crash: $ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio \ -device virtio-scsi-pci,bus=pci.0 -drive \ id=drive_image2,if=none,format=raw,file=file2,bps=512000,iops=100,group=foo \ -device scsi-hd,id=image2,drive=drive_image2 -drive \ id=drive_image3,if=none,format=raw,file=file3,bps=512000,iops=100,group=foo \ -device scsi-hd,id=image3,drive=drive_image3 {'execute':'qmp_capabilities'} {'execute':'device_del','arguments':{'id':'image3'}} {'execute':'system_reset'} Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1428810 Suggested-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20170406190847.29347-1-eblake@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-04-11block: pass the right options for BlockDriver.bdrv_open()Dong Jia Shi
raw_open() expects the caller always passing in the right actual @options parameter. But when trying to applying snapshot on a RBD image, bdrv_snapshot_goto() calls raw_open() (by calling the bdrv_open callback on the BlockDriver) with a NULL @options, and that will result in a Segmentation fault. For the other non-raw format drivers, it also makes sense to passing in the actual options, althought they don't trigger the problem so far. Let's prepare a @options by adding the "file" key-value pair to a copy of the actual options that were given for the node (i.e. bs->options), and pass it to the callback. BlockDriver.bdrv_open() expects bs->file to be NULL and just overwrites it with the result from bdrv_open_child(). That means we should actually make sure it's NULL because otherwise the child BDS will have a reference count that is 1 too high. So we unconditionally invoke bdrv_unref_child() before calling BlockDriver.bdrv_open(), and we wrap everything in bdrv_ref()/bdrv_unref() so the BDS isn't deleted in the meantime. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Message-id: 20170405091909.36357-2-bjsdjshi@linux.vnet.ibm.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-04-11Merge remote-tracking branch 'remotes/famz/tags/block-pull-request' into stagingPeter Maydell
# gpg: Signature made Tue 11 Apr 2017 13:10:55 BST # gpg: using RSA key 0xCA35624C6A9171C6 # gpg: Good signature from "Fam Zheng <famz@redhat.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6 * remotes/famz/tags/block-pull-request: sheepdog: Use bdrv_coroutine_enter before BDRV_POLL_WHILE block: Fix bdrv_co_flush early return block: Use bdrv_coroutine_enter to start I/O coroutines qemu-io-cmds: Use bdrv_coroutine_enter blockjob: Use bdrv_coroutine_enter to start coroutine block: Introduce bdrv_coroutine_enter async: Introduce aio_co_enter coroutine: Extract qemu_aio_coroutine_enter tests/block-job-txn: Don't start block job before adding to txn block: Quiesce old aio context during bdrv_set_aio_context block: Make bdrv_parent_drained_begin/end public Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-04-11sheepdog: Use bdrv_coroutine_enter before BDRV_POLL_WHILEFam Zheng
When called from main thread, the coroutine should run in the context of bs. Use bdrv_coroutine_enter to ensure that. Signed-off-by: Fam Zheng <famz@redhat.com>
2017-04-11block: Fix bdrv_co_flush early returnFam Zheng
bdrv_inc_in_flight and bdrv_dec_in_flight are mandatory for BDRV_POLL_WHILE to work, even for the shortcut case where flush is unnecessary. Move the if block to below bdrv_dec_in_flight, and BTW fix the variable declaration position. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-11block: Use bdrv_coroutine_enter to start I/O coroutinesFam Zheng
BDRV_POLL_WHILE waits for the started I/O by releasing bs's ctx then polling the main context, which relies on the yielded coroutine continuing on bs->ctx before notifying qemu_aio_context with bdrv_wakeup(). Thus, using qemu_coroutine_enter to start I/O is wrong because if the coroutine is entered from main loop, co->ctx will be qemu_aio_context, as a result of the "release, poll, acquire" loop of BDRV_POLL_WHILE, race conditions happen when both main thread and the iothread access the same BDS: main loop iothread ----------------------------------------------------------------------- blockdev_snapshot aio_context_acquire(bs->ctx) virtio_scsi_data_plane_handle_cmd bdrv_drained_begin(bs->ctx) bdrv_flush(bs) bdrv_co_flush(bs) aio_context_acquire(bs->ctx).enter ... qemu_coroutine_yield(co) BDRV_POLL_WHILE() aio_context_release(bs->ctx) aio_context_acquire(bs->ctx).return ... aio_co_wake(co) aio_poll(qemu_aio_context) ... co_schedule_bh_cb() ... qemu_coroutine_enter(co) ... /* (A) bdrv_co_flush(bs) /* (B) I/O on bs */ continues... */ aio_context_release(bs->ctx) aio_context_acquire(bs->ctx) Note that in above case, bdrv_drained_begin() doesn't do the "release, poll, acquire" in BDRV_POLL_WHILE, because bs->in_flight == 0. Fix this by using bdrv_coroutine_enter and enter coroutine in the right context. iotests 109 output is updated because the coroutine reenter flow during mirror job complete is different (now through co_queue_wakeup, instead of the unconditional qemu_coroutine_switch before), making the end job len different. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2017-04-11qemu-io-cmds: Use bdrv_coroutine_enterFam Zheng
qemu_coroutine_create associates @co to qemu_aio_context but we poll blk's context below. If the coroutine yields, it may never get resumed again. Use bdrv_coroutine_enter to make sure we are starting the I/O on the right context. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2017-04-11blockjob: Use bdrv_coroutine_enter to start coroutineFam Zheng
Resuming and especially starting of the block job coroutine, could be issued in the main thread. However the coroutine's "home" ctx should be set to the same context as job->blk. Use bdrv_coroutine_enter to ensure that. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2017-04-11block: Introduce bdrv_coroutine_enterFam Zheng
Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>