aboutsummaryrefslogtreecommitdiff
path: root/util
AgeCommit message (Collapse)Author
2017-12-15sparc: Make sure we mmap at SHMLBA alignmentPeter Maydell
SPARC Linux has an oddity that it insists that mmap() of MAP_FIXED memory must be at an alignment defined by SHMLBA, which is more aligned than the page size (typically, SHMLBA alignment is to 16K, and pages are 8K). This is a relic of ancient hardware that had cache aliasing constraints, but even on modern hardware the kernel still insists on the alignment. To ensure that we get mmap() alignment sufficient to make the kernel happy, change QEMU_VMALLOC_ALIGN, qemu_fd_getpagesize() and qemu_mempath_getpagesize() to use the maximum of getpagesize() and SHMLBA. In particular, this allows 'make check' to pass on Sparc: we were previously failing the ivshmem tests. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1512752248-17857-1-git-send-email-peter.maydell@linaro.org
2017-11-28sockets: avoid crash when cleaning up sockets for an invalid FDDaniel P. Berrange
If socket_listen_cleanup is passed an invalid FD, then querying the socket local address will fail. We must thus be prepared for the returned addr to be NULL Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-11-21coroutine: abort if we try to schedule or enter a pending coroutineJeff Cody
The previous patch fixed a race condition, in which there were coroutines being executing doubly, or after coroutine deletion. We can detect common scenarios when this happens, and print an error message and abort before we corrupt memory / data, or segfault. This patch will abort if an attempt to enter a coroutine is made while it is currently pending execution, either in a specific AioContext bh, or pending execution via a timer. It will also abort if a coroutine is scheduled, before a prior scheduled run has occurred. We cannot rely on the existing co->caller check for recursive re-entry to catch this, as the coroutine may run and exit with COROUTINE_TERMINATE before the scheduled coroutine executes. (This is the scenario that was occurring and fixed in the previous patch). This patch also re-orders the Coroutine struct elements in an attempt to optimize caching. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-11-16Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell
Miscellaneous bugfixes # gpg: Signature made Wed 15 Nov 2017 15:27:25 GMT # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: fix scripts/update-linux-headers.sh here document exec: Do not resolve subpage in mru_section util/stats64: Fix min/max comparisons cpu-exec: avoid cpu_exec_nocache infinite loop with record/replay cpu-exec: don't overwrite exception_index vhost-user-scsi: add missing virtqueue_size param target-i386: adds PV_TLB_FLUSH CPUID feature bit thread-posix: fix qemu_rec_mutex_trylock macro Makefile: simpler/faster "make help" ioapic/tracing: Remove last DPRINTFs Enable 8-byte wide MMIO for 16550 serial devices Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-11-15util/stats64: Fix min/max comparisonsMax Reitz
stat64_min_slow() and stat64_max_slow() compare the wrong way. This makes iotest 136 fail with clang and -m32. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20171114232223.25207-1-mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-08util/async: use atomic_mb_set in qemu_bh_cancelSergio Lopez
Commit b7a745d added a qemu_bh_cancel call to the completion function as an optimization to prevent it from unnecessarily rescheduling itself. This completion function is scheduled from worker_thread, after setting the state of a ThreadPoolElement to THREAD_DONE. This was considered to be safe, as the completion function restarts the loop just after the call to qemu_bh_cancel. But, as this loop lacks a HW memory barrier, the read of req->state may actually happen _before_ the call, seeing it still as THREAD_QUEUED, and ending the completion function without having processed a pending TPE linked at pool->head: worker thread | I/O thread ------------------------------------------------------------------------ | speculatively read req->state req->state = THREAD_DONE; | qemu_bh_schedule(p->completion_bh) | bh->scheduled = 1; | | qemu_bh_cancel(p->completion_bh) | bh->scheduled = 0; | if (req->state == THREAD_DONE) | // sees THREAD_QUEUED The source of the misunderstanding was that qemu_bh_cancel is now being used by the _consumer_ rather than the producer, and therefore now needs to have acquire semantics just like e.g. aio_bh_poll. In some situations, if there are no other independent requests in the same aio context that could eventually trigger the scheduling of the completion function, the omitted TPE and all operations pending on it will get stuck forever. [Added Sergio's updated wording about the HW memory barrier. --Stefan] Signed-off-by: Sergio Lopez <slp@redhat.com> Message-id: 20171108063447.2842-1-slp@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-11-07sockets: avoid leak of listen file descriptorDaniel P. Berrange
If we iterate over the full port range without successfully binding+listening on the socket, we'll try the next address, whereupon we overwrite the slisten file descriptor variable without closing it. Rather than having two places where we open + close socket FDs on different iterations of nested for loops, re-arrange the code to always open+close within the same loop iteration. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-11-06aio-posix: drop QEMU_AIO_POLL_MAX_NS env varStefan Hajnoczi
This hunk should not have been merged but I forgot to remove it. Let's remove it before it slips into a QEMU release. ¯\_(ツ)_/¯ Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20171103154041.12617-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-11-02oslib-posix: Use sysctl(2) call to resolve exec_dir on NetBSDKamil Rytarowski
NetBSD 8.0(beta) ships with KERN_PROC_PATHNAME in sysctl(2). Older NetBSD versions can use argv[0] parsing fallback. This code section is partly shared with FreeBSD. Signed-off-by: Kamil Rytarowski <n54@gmx.com> Message-id: 20171028194833.23858-1-n54@gmx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-24osdep: introduce qemu_mprotect_rwx/noneEmilio G. Cota
Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-20oslib-posix: Fix compiler warning and some data typesStefan Weil
gcc warning: /qemu/util/oslib-posix.c:304:11: error: variable ‘addr’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] Fix also some related data types: numpages, hpagesize are used as pointer offset. Always use size_t for them and also for the derived numpages_per_thread and size_per_thread. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Stefan Weil <sw@weilnetz.de> Message-id: 20171016202912.1117-1-sw@weilnetz.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-16sockets: Handle race condition between binds to the same portKnut Omang
If an offset of ports is specified to the inet_listen_saddr function(), and two or more processes tries to bind from these ports at the same time, occasionally more than one process may be able to bind to the same port. The condition is detected by listen() but too late to avoid a failure. This function is called by socket_listen() and used by all socket listening code in QEMU, so all cases where any form of dynamic port selection is used should be subject to this issue. Add code to close and re-establish the socket when this condition is observed, hiding the race condition from the user. Also clean up some issues with error handling to allow more accurate reporting of the cause of an error. This has been developed and tested by means of the test-listen unit test in the previous commit. Enable the test for make check now that it passes. Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: Knut Omang <knut.omang@oracle.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-16sockets: factor out create_fast_reuse_socketKnut Omang
Another refactoring step to prepare for fixing the problem exposed with the test-listen test in the previous commit Signed-off-by: Knut Omang <knut.omang@oracle.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-16sockets: factor out a new try_bind() functionKnut Omang
A refactoring step to prepare for the problem exposed by the test-listen test in the previous commit. Simplify and reorganize the IPv6 specific extra measures and move it out of the for loop to increase code readability. No semantic changes. Signed-off-by: Knut Omang <knut.omang@oracle.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-10util: move qemu_real_host_page_size/mask to osdep.hEmilio G. Cota
These only depend on the host and therefore belong in the common osdep, not in a target-dependent object. While at it, query the host during an init constructor, which guarantees the page size will be well-defined throughout the execution of the program. Suggested-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-09config: qemu_config_parse() return number of config groupsEduardo Habkost
Change qemu_config_parse() to return the number of config groups in success and -EINVAL on error. This will allow callers of qemu_config_parse() to check if something was really loaded from the config file. All existing callers of qemu_config_parse() and qemu_read_config_file() only check if the return value was negative, so the change shouldn't affect them. Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20171004025043.3788-2-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-06hbitmap: Rename serialization_granularity to serialization_alignEric Blake
The only client of hbitmap_serialization_granularity() is dirty-bitmap's bdrv_dirty_bitmap_serialization_align(). Keeping the two names consistent is worthwhile, and the shorter name is more representative of what the function returns (the required alignment to be used for start/count of other serialization functions, where violating the alignment causes assertion failures). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-03aio: fix assert when remove poll during destroyStefan Hajnoczi
After iothread is enabled internally inside QEMU with GMainContext, we may encounter this warning when destroying the iothread: (qemu-system-x86_64:19925): GLib-CRITICAL **: g_source_remove_poll: assertion '!SOURCE_DESTROYED (source)' failed The problem is that g_source_remove_poll() does not allow to remove one source from array if the source is detached from its owner context. (peterx: which IMHO does not make much sense) Fix it on QEMU side by avoid calling g_source_remove_poll() if we know the object is during destruction, and we won't leak anything after all since the array will be gone soon cleanly even with that fd. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-id: 20170928025958.1420-6-peterx@redhat.com [peterx: write the commit message] Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-09-27Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches # gpg: Signature made Tue 26 Sep 2017 14:52:32 BST # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (24 commits) block/qcow2-bitmap: fix use of uninitialized pointer qemu-iotests: add shrinking image test qcow2: add shrink image support qcow2: add qcow2_cache_discard qemu-img: add --shrink flag for resize iotests: fix 181: enable postcopy-ram capability on target qemu-iotests: Test change-backing-file command block: Fix permissions after bdrv_reopen() block: reopen: Queue children after their parents block: Base permissions on rw state after reopen block: Add reopen queue to bdrv_check_perm() block: Add reopen_queue to bdrv_child_perm() qemu-io: Drop write permissions before read-only reopen block: Clean up some bad code in the vvfat driver block/throttle-groups.c: allocate RestartData on the heap throttle: Assert that bkt->max is valid in throttle_compute_wait() iotests: Print full path of bad output if mismatch iotests: use virtio aliases for 067 iotests: use -ccw on s390x for 051 iotests: use -ccw on s390x for 040, 139, and 182 ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-26throttle: Assert that bkt->max is valid in throttle_compute_wait()Alberto Garcia
If bkt->max == 0 and bkt->burst_length > 1 then we could have a division by 0 in throttle_do_compute_wait(). That configuration is however not permitted and is already detected by throttle_is_valid(), but let's assert it in throttle_compute_wait() to make it explicit. Found by Coverity (CID: 1381016). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26util/qemu-thread-posix.c: Replace OS ifdefs with CONFIG_HAVE_SEM_TIMEDWAITPeter Maydell
In qemu-thread-posix.c we have two implementations of the various qemu_sem_* functions, one of which uses native POSIX sem_* and the other of which emulates them with pthread conditions. This is necessary because not all our host OSes support sem_timedwait(). Instead of a hard-coded list of OSes which don't implement sem_timedwait(), which gets out of date, make configure test for the presence of the function and set a new CONFIG_HAVE_SEM_TIMEDWAIT appropriately. In particular, newer NetBSDs have sem_timedwait(), so this commit will switch them over to using it. OSX still does not have an implementation. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Kamil Rytarowski <n54@gmx.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-22bitmap: provide to_le/from_le helpersPeter Xu
Provide helpers to convert bitmaps to little endian format. It can be used when we want to send one bitmap via network to some other hosts. One thing to mention is that, these helpers only solve the problem of endianess, but it does not solve the problem of different word size on machines (the bitmaps managing same count of bits may contains different size when malloced). So we need to take care of the size alignment issue on the callers for now. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22bitmap: introduce bitmap_count_one()Peter Xu
Count how many bits set in the bitmap. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22bitmap: remove BITOP_WORD()Peter Xu
We have BIT_WORD(). It's the same. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-19Convert remaining single line fprintf() to warn_report()Alistair Francis
Convert any remaining uses of fprintf(stderr, "warning:"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using this command: find ./* -type f -exec sed -i 's|fprintf(.*".*warning[,:] |warn_report("|Ig' {} + The #include lines and chagnes to the test Makefile were manually updated to allow the code to compile. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Message-Id: <2c94ac3bb116cc6b8ebbcd66a254920a69665515.1503077821.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19Convert multi-line fprintf() to warn_report()Alistair Francis
Convert all the multi-line uses of fprintf(stderr, "warning:"..."\n"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using these commands: find ./* -type f -exec sed -i \ 'N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N;N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + Indentation fixed up manually afterwards. Some of the lines were manually edited to reduce the line length to below 80 charecters. Some of the lines with newlines in the middle of the string were also manually edit to avoid checkpatch errrors. The #include lines were manually updated to allow the code to compile. Several of the warning messages can be improved after this patch, to keep this patch mechanical this has been moved into a later patch. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Anthony Perard <anthony.perard@citrix.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Yongbok Kim <yongbok.kim@imgtec.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Alexander Graf <agraf@suse.de> Cc: Jason Wang <jasowang@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <5def63849ca8f551630c6f2b45bcb1c482f765a6.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19scsi: move non-emulation specific code to scsi/Paolo Bonzini
util/scsi.c includes some SCSI code that is shared by block/iscsi.c and hw/scsi, but the introduction of the persistent reservation helper will add many more instances of this. There is also include/block/scsi.h, which actually is not part of the core block layer. The persistent reservation manager will also need a home. A scsi/ directory provides one for both the aforementioned shared code and the PR manager code. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19scsi: Introduce scsi_sense_buf_to_errnoFam Zheng
This recognizes the "fixed" and "descriptor" format sense data, extracts the sense key/asc/ascq fields then converts them to an errno. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170821141008.19383-4-famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19scsi: Improve scsi_sense_to_errnoFam Zheng
Tweak the errno mapping to return more accurate/appropriate values. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170821141008.19383-3-famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19scsi: Refactor scsi sense interpreting codeFam Zheng
So that it can be reused outside of iscsi.c. Also update MAINTAINERS to include the new files in SCSI section. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170821141008.19383-2-famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-07configure: Drop AIX host supportPeter Maydell
Nobody has mentioned AIX host support on the mailing list for years, and we have no test systems for it so it is most likely broken. We've advertised in configure for two releases now that we plan to drop support for this host OS, and have had no complaints. Drop the AIX host support code. We can also drop the now-unused AIX version of sys_cache_info(). Note that the _CALL_AIX define used in the PPC tcg backend is also used for Linux PPC64, and so that code should not be removed. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1504545540-8002-1-git-send-email-peter.maydell@linaro.org
2017-09-07Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches # gpg: Signature made Wed 06 Sep 2017 14:44:41 BST # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: qcow2: move qcow2_store_persistent_dirty_bitmaps() before cache flushing qemu-iotests: add 184 for throttle filter driver block: add throttle block filter driver block: convert ThrottleGroup to object with QOM block: tidy ThrottleGroupMember initializations block: add aio_context field in ThrottleGroupMember block: move ThrottleGroup membership to ThrottleGroupMember block: document semantics of bdrv_co_preadv|pwritev qcow: Check failure of bdrv_getlength() and bdrv_truncate() qcow: Change signature of get_cluster_offset() block: add default implementations for bdrv_co_get_block_status() block: remove bdrv_truncate callback in blkdebug block: remove unused bdrv_media_changed block: pass bdrv_* methods to bs->file by default in block filters Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-05block: convert ThrottleGroup to object with QOMManos Pitsidianakis
ThrottleGroup is converted to an object. This will allow the future throttle block filter drive easy creation and configuration of throttle groups in QMP and cli. A new QAPI struct, ThrottleLimits, is introduced to provide a shared struct for all throttle configuration needs in QMP. ThrottleGroups can be created via CLI as -object throttle-group,id=foo,x-iops-total=100,x-.. where x-* are individual limit properties. Since we can't add non-scalar properties in -object this interface must be used instead. However, setting these properties must be disabled after initialization because certain combinations of limits are forbidden and thus configuration changes should be done in one transaction. The individual properties will go away when support for non-scalar values in CLI is implemented and thus are marked as experimental. ThrottleGroup also has a `limits` property that uses the ThrottleLimits struct. It can be used to create ThrottleGroups or set the configuration in existing groups as follows: { "execute": "object-add", "arguments": { "qom-type": "throttle-group", "id": "foo", "props" : { "limits": { "iops-total": 100 } } } } { "execute" : "qom-set", "arguments" : { "path" : "foo", "property" : "limits", "value" : { "iops-total" : 99 } } } This also means a group's configuration can be fetched with qom-get. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-05Merge remote-tracking branch 'remotes/berrange/tags/pull-qio-20170905-2' ↵Peter Maydell
into staging Merge QEMU I/O 2017/09/05 v2 # gpg: Signature made Tue 05 Sep 2017 13:22:36 BST # gpg: using RSA key 0xBE86EBB415104FDF # gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>" # gpg: aka "Daniel P. Berrange <berrange@redhat.com>" # Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF * remotes/berrange/tags/pull-qio-20170905-2: io: fix check for handshake completion in TLS test io: add new qio_channel_{readv, writev, read, write}_all functions io: fix typo in docs comment for qio_channel_read util: remove the obsolete non-blocking connect io: fix temp directory used by test-io-channel-tls test Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-05util: remove the obsolete non-blocking connectCao jin
The non-blocking connect mechanism is obsolete, and it doesn't work well in inet connection, because it will call getaddrinfo first and getaddrinfo will blocks on DNS lookups. Since commit e65c67e4 & d984464e, the non-blocking connect of migration goes through QIOChannel in a different manner(using a thread), and nobody use this old non-blocking connect anymore. Any newly written code which needs a non-blocking connect should use the QIOChannel code, so we can drop NonBlockingConnectHandler as a concept entirely. Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Mao Zhongyi <maozy.fnst@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-09-04qapi: Generate FOO_str() macro for QAPI enum FOOMarkus Armbruster
The next commit will put it to use. May look pointless now, but we're going to change the FOO_lookup's type, and then it'll help. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-13-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-08-30oslib-posix: Print errors before aborting on qemu_alloc_stack()Eduardo Habkost
If QEMU is running on a system that's out of memory and mmap() fails, QEMU aborts with no error message at all, making it hard to debug the reason for the failure. Add perror() calls that will print error information before aborting. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20170829212053.6003-1-ehabkost@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-08-29throttle: Make burst_length 64bit and add range checksAlberto Garcia
LeakyBucket.burst_length is defined as an unsigned integer but the code never checks for overflows and it only makes sure that the value is not 0. In practice this means that the user can set something like throttling.iops-total-max-length=4294967300 despite being larger than UINT_MAX and the final value after casting to unsigned int will be 4. This patch changes the data type to uint64_t. This does not increase the storage size of LeakyBucket, and allows us to assign the value directly from qemu_opt_get_number() or BlockIOThrottle and then do the checks directly in throttle_is_valid(). The value of burst_length does not have a specific upper limit, but since the bucket size is defined by max * burst_length we have to prevent overflows. Instead of going for UINT64_MAX or something similar this patch reuses THROTTLE_VALUE_MAX, which allows I/O bursts of 1 GiB/s for 10 days in a row. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 1b2e3049803f71cafb2e1fa1be4fb47147a0d398.1503580370.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-08-29throttle: Make LeakyBucket.avg and LeakyBucket.max integer typesAlberto Garcia
Both the throttling limits set with the throttling.iops-* and throttling.bps-* options and their QMP equivalents defined in the BlockIOThrottle struct are integer values. Those limits are also reported in the BlockDeviceInfo struct and they are integers there as well. Therefore there's no reason to store them internally as double and do the conversion everytime we're setting or querying them, so this patch uses uint64_t for those types. Let's also use an unsigned type because we don't allow negative values anyway. LeakyBucket.level and LeakyBucket.burst_level do however remain double because their value changes depending on the fraction of time elapsed since the previous I/O operation. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: f29b840422767b5be2c41c2dfdbbbf6c5f8fedf8.1503580370.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-08-29throttle: Remove throttle_fix_bucket() / throttle_unfix_bucket()Alberto Garcia
The throttling code can change internally the value of bkt->max if it hasn't been set by the user. The problem with this is that if we want to retrieve the original value we have to undo this change first. This is ugly and unnecessary: this patch removes the throttle_fix_bucket() and throttle_unfix_bucket() functions completely and moves the logic to throttle_compute_wait(). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 5b0b9e1ac6eb208d709eddc7b09e7669a523bff3.1503580370.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-08-29throttle: Make throttle_is_valid() a bit less verboseAlberto Garcia
Use a pointer to the bucket instead of repeating cfg->buckets[i] all the time. This makes the code more concise and will help us expand the checks later and save a few line breaks. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 763ffc40a26b17d54cf93f5a999e4656049fcf0c.1503580370.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-08-29throttle: Update the throttle_fix_bucket() documentationAlberto Garcia
The way the throttling algorithm works is that requests start being throttled once the bucket level exceeds the burst limit. When we get there the bucket leaks at the level set by the user (bkt->avg), and that leak rate is what prevents guest I/O from exceeding the desired limit. If we don't allow bursts (i.e. bkt->max == 0) then we can start throttling requests immediately. The problem with keeping the threshold at 0 is that it only allows one request at a time, and as soon as there's a bit of I/O from the guest every other request will be throttled and performance will suffer considerably. That can even make the guest unable to reach the throttle limit if that limit is high enough, and that happens regardless of the block scheduler used by the guest. Increasing that threshold gives flexibility to the guest, allowing it to perform short bursts of I/O before being throttled. Increasing the threshold too much does not make a difference in the long run (because it's the leak rate what defines the actual throughput) but it does allow the guest to perform longer initial bursts and exceed the throttle limit for a short while. A burst value of bkt->avg / 10 allows the guest to perform 100ms' worth of I/O at the target rate without being throttled. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 31aae6645f0d1fbf3860fb2b528b757236f0c0a7.1503580370.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-08-11osdep: Add runtime OFD lock detectionFam Zheng
Build time check of OFD lock is not sufficient and can cause image open errors when the runtime environment doesn't support it. Add a helper function to probe it at runtime, additionally. Also provide a qemu_has_ofd_lock() for callers to check the status. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-08-08Revert "rcu: do not create thread in pthread_atfork callback"Paolo Bonzini
This reverts commit a59629fcc6f603e19b516dc08f75334e5c480bd0. This is not needed anymore because the IOThread mutex is not "magic" anymore (need not kick the CPU thread)and also because fork callbacks are only enabled at the very beginning of QEMU's execution. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-08rcu: completely disable pthread_atfork callbacks as soon as possiblePaolo Bonzini
Because of -daemonize, system mode QEMU sometimes needs to fork() and keep RCU enabled in the child. However, there is a possible deadlock with synchronize_rcu: - the CPU thread is inside a RCU critical section and wants to take the BQL in order to do MMIO - the monitor thread, which is owning the BQL, calls rcu_init_lock which tries to take the rcu_sync_lock - the call_rcu thread has taken rcu_sync_lock in synchronize_rcu, but synchronize_rcu needs the CPU thread to end the critical section before returning. This cannot happen for user-mode emulation, because it does not have a BQL. To fix it, assume that system mode QEMU only forks in preparation for exec (except when daemonizing) and disable pthread_atfork as soon as the double fork has happened. Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-31docs: fix broken paths to docs/devel/tracing.txtPhilippe Mathieu-Daudé
With the move of some docs/ to docs/devel/ on ac06724a71, no references were updated. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-07-24Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2017-07-24' ↵Peter Maydell
into staging Error reporting patches for 2017-07-24 # gpg: Signature made Mon 24 Jul 2017 13:17:49 BST # gpg: using RSA key 0x3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-error-2017-07-24: error: Revert unwanted change of warning messages Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-07-24error: Revert unwanted change of warning messagesMarkus Armbruster
Commit 97f4030 changed warning messages from timestamp-if-enabled progname ":" location "warning: " message to "warning: " timestamp-if-enabled progname ":" location message This regressed qemu-iotests 051. Put "warning: " right back where it was, along with "info: ". Reported-by: Kevin Wolf <kwolf@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1500449614-16811-1-git-send-email-armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2017-07-24util: Introduce include/qemu/cpuid.hRichard Henderson
Clang 3.9 passes the CONFIG_AVX2_OPT configure test. However, the supplied <cpuid.h> does not contain the bit_AVX2 define that we use when detecting whether the routine can be enabled. Introduce a qemu-specific header that uses the compiler's definition of __cpuid et al, but supplies any missing bit_* definitions needed. This avoids introducing any extra ifdefs to util/bufferiszero.c, and allows quite a few to be removed from tcg/i386/tcg-target.inc.c. Signed-off-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20170719044018.18063-1-rth@twiddle.net Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-07-21util/oslib-posix.c: Avoid warning on NetBSDPeter Maydell
On NetBSD the compiler warns: util/oslib-posix.c: In function 'sigaction_invoke': util/oslib-posix.c:589:5: warning: missing braces around initializer [-Wmissing-braces] siginfo_t si = { 0 }; ^ util/oslib-posix.c:589:5: warning: (near initialization for 'si.si_pad') [-Wmissing-braces] because on this platform siginfo_t is defined as typedef union siginfo { char si_pad[128]; /* Total size; for future expansion */ struct _ksiginfo _info; } siginfo_t; Avoid this warning by initializing the struct with {} instead; this is a GCC extension but we use it all over the codebase already. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1500568341-8389-1-git-send-email-peter.maydell@linaro.org