aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-10-06Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches # gpg: Signature made Fri 06 Oct 2017 16:52:59 BST # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (54 commits) block/mirror: check backing in bdrv_mirror_top_flush qcow2: truncate the tail of the image file after shrinking the image qcow2: fix return error code in qcow2_truncate() iotests: Fix 195 if IMGFMT is part of TEST_DIR block/mirror: check backing in bdrv_mirror_top_refresh_filename block: support passthrough of BDRV_REQ_FUA in crypto driver block: convert qcrypto_block_encrypt|decrypt to take bytes offset block: convert crypto driver to bdrv_co_preadv|pwritev block: fix data type casting for crypto payload offset crypto: expose encryption sector size in APIs block: use 1 MB bounce buffers for crypto instead of 16KB iotests: Add test 197 for covering copy-on-read block: Perform copy-on-read in loop block: Add blkdebug hook for copy-on-read iotests: Restore stty settings on completion block: Uniform handling of 0-length bdrv_get_block_status() qemu-io: Add -C for opening with copy-on-read commit: Remove overlay_bs qemu-iotests: Test commit block job where top has two parents qemu-iotests: Allow QMP pretty printing in common.qemu ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-06Merge remote-tracking branch ↵Peter Maydell
'remotes/pmaydell/tags/pull-target-arm-20171006' into staging target-arm: * v8M: more preparatory work * nvic: reset properly rather than leaving the nvic in a weird state * xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false * sd: fix out-of-bounds check for multi block reads * arm: Fix SMC reporting to EL2 when QEMU provides PSCI # gpg: Signature made Fri 06 Oct 2017 16:58:15 BST # gpg: using RSA key 0x3C2525ED14360CDE # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" # gpg: aka "Peter Maydell <pmaydell@gmail.com>" # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * remotes/pmaydell/tags/pull-target-arm-20171006: nvic: Add missing code for writing SHCSR.HARDFAULTPENDED bit target/arm: Factor out "get mmuidx for specified security state" target/arm: Fix calculation of secure mm_idx values target/arm: Implement security attribute lookups for memory accesses nvic: Implement Security Attribution Unit registers target/arm: Add v8M support to exception entry code target/arm: Add support for restoring v8M additional state context target/arm: Update excret sanity checks for v8M target/arm: Add new-in-v8M SFSR and SFAR target/arm: Don't warn about exception return with PC low bit set for v8M target/arm: Warn about restoring to unaligned stack target/arm: Check for xPSR mismatch usage faults earlier for v8M target/arm: Restore SPSEL to correct CONTROL register on exception return target/arm: Restore security state on exception return target/arm: Prepare for CONTROL.SPSEL being nonzero in Handler mode target/arm: Don't switch to target stack early in v7M exception return nvic: Clear the vector arrays and prigroup on reset hw/arm/xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false hw/sd: fix out-of-bounds check for multi block reads arm: Fix SMC reporting to EL2 when QEMU provides PSCI Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-06nvic: Add missing code for writing SHCSR.HARDFAULTPENDED bitPeter Maydell
When we added support for the new SHCSR bits in v8M in commit 437d59c17e9 the code to support writing to the new HARDFAULTPENDED bit was accidentally only added for non-secure writes; the secure banked version of the bit should also be writable. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-21-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Factor out "get mmuidx for specified security state"Peter Maydell
For the SG instruction and secure function return we are going to want to do memory accesses using the MMU index of the CPU in secure state, even though the CPU is currently in non-secure state. Write arm_v7m_mmu_idx_for_secstate() to do this job, and use it in cpu_mmu_index(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-17-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Fix calculation of secure mm_idx valuesPeter Maydell
In cpu_mmu_index() we try to do this: if (env->v7m.secure) { mmu_idx += ARMMMUIdx_MSUser; } but it will give the wrong answer, because ARMMMUIdx_MSUser includes the 0x40 ARM_MMU_IDX_M field, and so does the mmu_idx we're adding to, and we'll end up with 0x8n rather than 0x4n. This error is then nullified by the call to arm_to_core_mmu_idx() which masks out the high part, but we're about to factor out the code that calculates the ARMMMUIdx values so it can be used without passing it through arm_to_core_mmu_idx(), so fix this bug first. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-16-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Implement security attribute lookups for memory accessesPeter Maydell
Implement the security attribute lookups for memory accesses in the get_phys_addr() functions, causing these to generate various kinds of SecureFault for bad accesses. The major subtlety in this code relates to handling of the case when the security attributes the SAU assigns to the address don't match the current security state of the CPU. In the ARM ARM pseudocode for validating instruction accesses, the security attributes of the address determine whether the Secure or NonSecure MPU state is used. At face value, handling this would require us to encode the relevant bits of state into mmu_idx for both S and NS at once, which would result in our needing 16 mmu indexes. Fortunately we don't actually need to do this because a mismatch between address attributes and CPU state means either: * some kind of fault (usually a SecureFault, but in theory perhaps a UserFault for unaligned access to Device memory) * execution of the SG instruction in NS state from a Secure & NonSecure code region The purpose of SG is simply to flip the CPU into Secure state, so we can handle it by emulating execution of that instruction directly in arm_v7m_cpu_do_interrupt(), which means we can treat all the mismatch cases as "throw an exception" and we don't need to encode the state of the other MPU bank into our mmu_idx values. This commit doesn't include the actual emulation of SG; it also doesn't include implementation of the IDAU, which is a per-board way to specify hard-coded memory attributes for addresses, which override the CPU-internal SAU if they specify a more secure setting than the SAU is programmed to. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-15-git-send-email-peter.maydell@linaro.org
2017-10-06nvic: Implement Security Attribution Unit registersPeter Maydell
Implement the register interface for the SAU: SAU_CTRL, SAU_TYPE, SAU_RNR, SAU_RBAR and SAU_RLAR. None of the actual behaviour is implemented here; registers just read back as written. When the CPU definition for Cortex-M33 is eventually added, its initfn will set cpu->sau_sregion, in the same way that we currently set cpu->pmsav7_dregion for the M3 and M4. Number of SAU regions is typically a configurable CPU parameter, but this patch doesn't provide a QEMU CPU property for it. We can easily add one when we have a board that requires it. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-14-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Add v8M support to exception entry codePeter Maydell
Add support for v8M and in particular the security extension to the exception entry code. This requires changes to: * calculation of the exception-return magic LR value * push the callee-saves registers in certain cases * clear registers when taking non-secure exceptions to avoid leaking information from the interrupted secure code * switch to the correct security state on entry * use the vector table for the security state we're targeting Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-13-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Add support for restoring v8M additional state contextPeter Maydell
For v8M, exceptions from Secure to Non-Secure state will save callee-saved registers to the exception frame as well as the caller-saved registers. Add support for unstacking these registers in exception exit when necessary. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-12-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Update excret sanity checks for v8MPeter Maydell
In v8M, more bits are defined in the exception-return magic values; update the code that checks these so we accept the v8M values when the CPU permits them. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-11-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Add new-in-v8M SFSR and SFARPeter Maydell
Add the new M profile Secure Fault Status Register and Secure Fault Address Register. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-10-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Don't warn about exception return with PC low bit set for v8MPeter Maydell
In the v8M architecture, return from an exception to a PC which has bit 0 set is not UNPREDICTABLE; it is defined that bit 0 is discarded [R_HRJH]. Restrict our complaint about this to v7M. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-9-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Warn about restoring to unaligned stackPeter Maydell
Attempting to do an exception return with an exception frame that is not 8-aligned is UNPREDICTABLE in v8M; warn about this. (It is not UNPREDICTABLE in v7M, and our implementation can handle the merely-4-aligned case fine, so we don't need to do anything except warn.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-8-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Check for xPSR mismatch usage faults earlier for v8MPeter Maydell
ARM v8M specifies that the INVPC usage fault for mismatched xPSR exception field and handler mode bit should be checked before updating the PSR and SP, so that the fault is taken with the existing stack frame rather than by pushing a new one. Perform this check in the right place for v8M. Since v7M specifies in its pseudocode that this usage fault check should happen later, we have to retain the original code for that check rather than being able to merge the two. (The distinction is architecturally visible but only in very obscure corner cases like attempting an invalid exception return with an exception frame in read only memory.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-7-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Restore SPSEL to correct CONTROL register on exception returnPeter Maydell
On exception return for v8M, the SPSEL bit in the EXC_RETURN magic value should be restored to the SPSEL bit in the CONTROL register banked specified by the EXC_RETURN.ES bit. Add write_v7m_control_spsel_for_secstate() which behaves like write_v7m_control_spsel() but allows the caller to specify which CONTROL bank to use, reimplement write_v7m_control_spsel() in terms of it, and use it in exception return. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-6-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Restore security state on exception returnPeter Maydell
Now that we can handle the CONTROL.SPSEL bit not necessarily being in sync with the current stack pointer, we can restore the correct security state on exception return. This happens before we start to read registers off the stack frame, but after we have taken possible usage faults for bad exception return magic values and updated CONTROL.SPSEL. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-5-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Prepare for CONTROL.SPSEL being nonzero in Handler modePeter Maydell
In the v7M architecture, there is an invariant that if the CPU is in Handler mode then the CONTROL.SPSEL bit cannot be nonzero. This in turn means that the current stack pointer is always indicated by CONTROL.SPSEL, even though Handler mode always uses the Main stack pointer. In v8M, this invariant is removed, and CONTROL.SPSEL may now be nonzero in Handler mode (though Handler mode still always uses the Main stack pointer). In preparation for this change, change how we handle this bit: rename switch_v7m_sp() to the now more accurate write_v7m_control_spsel(), and make it check both the handler mode state and the SPSEL bit. Note that this implicitly changes the point at which we switch active SP on exception exit from before we pop the exception frame to after it. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-4-git-send-email-peter.maydell@linaro.org
2017-10-06target/arm: Don't switch to target stack early in v7M exception returnPeter Maydell
Currently our M profile exception return code switches to the target stack pointer relatively early in the process, before it tries to pop the exception frame off the stack. This is awkward for v8M for two reasons: * in v8M the process vs main stack pointer is not selected purely by the value of CONTROL.SPSEL, so updating SPSEL and relying on that to switch to the right stack pointer won't work * the stack we should be reading the stack frame from and the stack we will eventually switch to might not be the same if the guest is doing strange things Change our exception return code to use a 'frame pointer' to read the exception frame rather than assuming that we can switch the live stack pointer this early. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1506092407-26985-3-git-send-email-peter.maydell@linaro.org
2017-10-06nvic: Clear the vector arrays and prigroup on resetPeter Maydell
Reset for devices does not include an automatic clear of the device state (unlike CPU state, where most of the state structure is cleared to zero). Add some missing initialization of NVIC state that meant that the device was left in the wrong state if the guest did a warm reset. (In particular, since we were resetting the computed state like s->exception_prio but not all the state it was computed from like s->vectors[x].active, the NVIC wound up in an inconsistent state that could later trigger assertion failures.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 1506092407-26985-2-git-send-email-peter.maydell@linaro.org
2017-10-06hw/arm/xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = falseThomas Huth
The device uses serial_hds in its realize function and thus can't be used twice. Apart from that, the comma in its name makes it quite hard to use for the user anyway, since a comma is normally used to separate the device name from its properties when using the "-device" parameter or the "device_add" HMP command. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Message-id: 1506441116-16627-1-git-send-email-thuth@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-06hw/sd: fix out-of-bounds check for multi block readsMichael Olbrich
The current code checks if the next block exceeds the size of the card. This generates an error while reading the last block of the card. Do the out-of-bounds check when starting to read a new block to fix this. This issue became visible with increased error checking in Linux 4.13. Cc: qemu-stable@nongnu.org Signed-off-by: Michael Olbrich <m.olbrich@pengutronix.de> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Message-id: 20170916091611.10241-1-m.olbrich@pengutronix.de Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-06arm: Fix SMC reporting to EL2 when QEMU provides PSCIJan Kiszka
This properly forwards SMC events to EL2 when PSCI is provided by QEMU itself and, thus, ARM_FEATURE_EL3 is off. Found and tested with the Jailhouse hypervisor. Solution based on suggestions by Peter Maydell. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Message-id: 4f243068-aaea-776f-d18f-f9e05e7be9cd@siemens.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-06Merge remote-tracking branch 'mreitz/tags/pull-block-2017-10-06' into ↵Kevin Wolf
queue-block Block patches # gpg: Signature made Fri Oct 6 16:30:57 2017 CEST # gpg: using RSA key F407DB0061D5CF40 # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * mreitz/tags/pull-block-2017-10-06: block/mirror: check backing in bdrv_mirror_top_flush qcow2: truncate the tail of the image file after shrinking the image qcow2: fix return error code in qcow2_truncate() iotests: Fix 195 if IMGFMT is part of TEST_DIR block/mirror: check backing in bdrv_mirror_top_refresh_filename block: support passthrough of BDRV_REQ_FUA in crypto driver block: convert qcrypto_block_encrypt|decrypt to take bytes offset block: convert crypto driver to bdrv_co_preadv|pwritev block: fix data type casting for crypto payload offset crypto: expose encryption sector size in APIs block: use 1 MB bounce buffers for crypto instead of 16KB Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06block/mirror: check backing in bdrv_mirror_top_flushVladimir Sementsov-Ogievskiy
Backing may be zero after failed bdrv_append in mirror_start_job, which leads to SIGSEGV. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170929152255.5431-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06qcow2: truncate the tail of the image file after shrinking the imagePavel Butsykin
Now after shrinking the image, at the end of the image file, there might be a tail that probably will never be used. So we can find the last used cluster and cut the tail. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170929121613.25997-3-pbutsykin@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06qcow2: fix return error code in qcow2_truncate()Pavel Butsykin
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20170929121613.25997-2-pbutsykin@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06iotests: Fix 195 if IMGFMT is part of TEST_DIRMax Reitz
do_run_qemu() in iotest 195 first applies _filter_imgfmt when printing qemu's command line and _filter_testdir only afterwards. Therefore, if the image format is part of the test directory path, _filter_testdir will no longer apply and the actual output will differ from the reference output even in case of success. For example, TEST_DIR might be "/tmp/test-qcow2", in which case _filter_imgfmt first transforms this to "/tmp/test-IMGFMT" which is no longer recognized as the TEST_DIR by _filter_testdir. Fix this by not applying _filter_imgfmt in do_run_qemu() but in run_qemu() instead, and only after _filter_testdir. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20170927211334.3988-1-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06block/mirror: check backing in bdrv_mirror_top_refresh_filenameVladimir Sementsov-Ogievskiy
Backing may be zero after failed bdrv_attach_child in bdrv_set_backing_hd, which leads to SIGSEGV. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170928120300.58164-1-vsementsov@virtuozzo.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06block: support passthrough of BDRV_REQ_FUA in crypto driverDaniel P. Berrange
The BDRV_REQ_FUA flag can trivially be allowed in the crypt driver as a passthrough to the underlying block driver. Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-7-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06block: convert qcrypto_block_encrypt|decrypt to take bytes offsetDaniel P. Berrange
Instead of sector offset, take the bytes offset when encrypting or decrypting data. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-6-berrange@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06block: convert crypto driver to bdrv_co_preadv|pwritevDaniel P. Berrange
Make the crypto driver implement the bdrv_co_preadv|pwritev callbacks, and also use bdrv_co_preadv|pwritev for I/O with the protocol driver beneath. This replaces sector based I/O with byte based I/O, and allows us to stop assuming the physical sector size matches the encryption sector size. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-5-berrange@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06block: fix data type casting for crypto payload offsetDaniel P. Berrange
The crypto APIs report the offset of the data payload as an uint64_t type, but the block driver is casting to size_t or ssize_t which will potentially truncate. Most of the block APIs use int64_t for offsets meanwhile, so even if using uint64_t in the crypto block driver we are still at risk of truncation. Change the block crypto driver to use uint64_t, but add asserts that the value is less than INT64_MAX. Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-4-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06crypto: expose encryption sector size in APIsDaniel P. Berrange
While current encryption schemes all have a fixed sector size of 512 bytes, this is not guaranteed to be the case in future. Expose the sector size in the APIs so the block layer can remove assumptions about fixed 512 byte sectors. Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-3-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06block: use 1 MB bounce buffers for crypto instead of 16KBDaniel P. Berrange
Using 16KB bounce buffers creates a significant performance penalty for I/O to encrypted volumes on storage which high I/O latency (rotating rust & network drives), because it triggers lots of fairly small I/O operations. On tests with rotating rust, and cache=none|directsync, write speed increased from 2MiB/s to 32MiB/s, on a par with that achieved by the in-kernel luks driver. With other cache modes the in-kernel driver is still notably faster because it is able to report completion of the I/O request before any encryption is done, while the in-QEMU driver must encrypt the data before completion. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170927125340.12360-2-berrange@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06iotests: Add test 197 for covering copy-on-readEric Blake
Add a test for qcow2 copy-on-read behavior, including exposure for the just-fixed bugs. The copy-on-read behavior is always to a qcow2 image, but the test is careful to allow running with most image protocol/format combos as the backing file being copied from (luks being the exception, as it is harder to pass the right secret to all the right places). In fact, for './check nbd', this appears to be the first time we've had a qcow2 image wrapping NBD, requiring an additional line in _filter_img_create to match the similar line in _filter_img_info. Invoking blkdebug to prove we don't write too much took some effort to get working; and it requires that $TEST_WRAP (based on $TEST_DIR) not be subject to word splitting. We may decide later to have the entire iotests suite use relative rather than absolute names, to avoid problems inherited by the absolute name of $PWD or $TEST_DIR, at which point the sanity check in this commit could be simplified. This test requires at least 2G of consecutive memory to succeed; as such, it is prone to spurious failures, particularly on 32-bit machines under load. This situation is detected and triggers an early exit to skip the test, rather than a failure. To manually provoke this setup on a beefier machine, I used: $ (ulimit -S -v 1000000; ./check -qcow2 197) Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06block: Perform copy-on-read in loopEric Blake
Improve our braindead copy-on-read implementation. Pre-patch, we have multiple issues: - we create a bounce buffer and perform a write for the entire request, even if the active image already has 99% of the clusters occupied, and really only needs to copy-on-read the remaining 1% of the clusters - our bounce buffer was as large as the read request, and can needlessly exhaust our memory by using double the memory of the request size (the original request plus our bounce buffer), rather than a capped maximum overhead beyond the original - if a driver has a max_transfer limit, we are bypassing the normal code in bdrv_aligned_preadv() that fragments to that limit, and instead attempt to read the entire buffer from the driver in one go, which some drivers may assert on - a client can request a large request of nearly 2G such that rounding the request out to cluster boundaries results in a byte count larger than 2G. While this cannot exceed 32 bits, it DOES have some follow-on problems: -- the call to bdrv_driver_pread() can assert for exceeding BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks .bdrv_co_preadv -- if the buffer is all zeroes, the subsequent call to bdrv_co_do_pwrite_zeroes is a no-op due to a negative size, which means we did not actually copy on read Fix all of these issues by breaking up the action into a loop, where each iteration is capped to sane limits. Also, querying the allocation status allows us to optimize: when data is already present in the active layer, we don't need to bounce. Note that the code has a telling comment that copy-on-read should probably be a filter driver rather than a bolt-on hack in io.c; but that remains a task for another day. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06block: Add blkdebug hook for copy-on-readEric Blake
Make it possible to inject errors on writes performed during a read operation due to copy-on-read semantics. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06iotests: Restore stty settings on completionEric Blake
Executing qemu with a terminal as stdin will temporarily alter stty settings on that terminal (for example, disabling echo), because of how we run both the monitor and any multiplexing with guest input. Normally, qemu restores the original settings on exit; but if an iotest triggers qemu to abort in the middle, we can be left with the altered terminal setup. This can make life very annoying when debugging an iotest failure (not everyone remembers the trick of blind-typing 'stty sane' without echo, and some people prefer terminal settings that are slightly different than the defaults picked by 'stty sane'). It is possible to avoid qemu corrupting the terminal by not passing a terminal to qemu's stdin in the first place (as in, use './check ... </dev/null'), but that's extra typing to have to remember. But running 'exec </dev/null' in the harness seems like it might be too heavy of a hammer. So I instead went the the solution of saving and restoring the stty settings, only when the harness detects that it is run interactively. I tested this patch by forcing an allocation failure (I can't guarantee that this particular limit will work on all setups, but it shows the idea): $ (ulimit -S -v 500000; ./check -qcow2 1) Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06block: Uniform handling of 0-length bdrv_get_block_status()Eric Blake
Handle a 0-length block status request up front, with a uniform return value claiming the area is not allocated. Most callers don't pass a length of 0 to bdrv_get_block_status() and friends; but it definitely happens with a 0-length read when copy-on-read is enabled. While we could audit all callers to ensure that they never make a 0-length request, and then assert that fact, it was just as easy to fix things to always report success (as long as the callers are careful to not go into an infinite loop). However, we had inconsistent behavior on whether the status is reported as allocated or defers to the backing layer, depending on what callbacks the driver implements, and possibly wasting quite a few CPU cycles to get to that answer. Consistently reporting unallocated up front doesn't really hurt anything, and makes it easier both for callers (0-length requests now have well-defined behavior) and for drivers (drivers don't have to deal with 0-length requests). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06qemu-io: Add -C for opening with copy-on-readEric Blake
Make it easier to enable copy-on-read during iotests, by exposing a new bool option to main and open. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06commit: Remove overlay_bsKevin Wolf
We don't need to make any assumptions about the graph layout above the top node of the commit operation any more. Remove the use of bdrv_find_overlay() and related variables from the commit job code. bdrv_drop_intermediate() doesn't use the 'active' parameter any more, so we can just drop it. The overlay node was previously added to the block job to get a BLK_PERM_GRAPH_MOD. We really need to respect those permissions in bdrv_drop_intermediate() now, but as long as we haven't figured out yet how BLK_PERM_GRAPH_MOD is actually supposed to work, just leave a TODO comment there. With this change, it is now possible to perform another block job on an overlay node without conflicts. qemu-iotests 030 is changed accordingly. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-10-06qemu-iotests: Test commit block job where top has two parentsKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06qemu-iotests: Allow QMP pretty printing in common.qemuKevin Wolf
QMP responses to certain commands can become quite long, which doesn't only make reading them hard, but also means that the maximum line length in patch emails can be exceeded. Allow tests to switch to QMP pretty printing, which results in more, but shorter lines. We also need to make sure to keep indentation in the response for this to work as expected. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-10-06commit: Support multiple roots above top nodeKevin Wolf
This changes the commit block job to support operation in a graph where there is more than a single active layer that references the top node. This involves inserting the commit filter node not only on the path between the given active node and the top node, but between the top node and all of its parents. On completion, bdrv_drop_intermediate() must consider all parents for updating the backing file link. These parents may be backing files themselves and as such read-only; reopen them temporarily if necessary. Previously this was achieved by the bdrv_reopen() calls in the commit block job that made overlay_bs read-write for the whole duration of the block job, even though write access is only needed on completion. Now that we consider all parents, overlay_bs is meaningless. It is left in place in this commit, but we'll remove it soon. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06block: Introduce BdrvChildRole.update_filenameKevin Wolf
There is no good reason for bdrv_drop_intermediate() to know the active layer above the subchain it is operating on - even more so, because the assumption that there is a single active layer above it is not generally true. In order to prepare removal of the active parameter, use a BdrvChildRole callback to update the backing file string in the overlay image instead of directly calling bdrv_change_backing_file(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2017-10-06qemu-iotests: merge "check" and "common"Paolo Bonzini
"check" is full of qemu-iotests--specific details. Separating it from "common" does not make much sense anymore. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06qemu-iotests: get rid of $iamPaolo Bonzini
The variable is almost unused, and one of the two uses is actually uninitialized. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06qemu-iotests: fix uninitialized variablePaolo Bonzini
The variable is used in "common" but defined only after the file is sourced. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06qemu-iotests: disintegrate more parts of common.configPaolo Bonzini
Split "check" parts from tests part. For the directory setup, the actual computation of directories goes in "check", while the sanity checks go in the tests. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06qemu-iotests: do not include common.rc in "check"Paolo Bonzini
It only provides functions used by the test programs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>