Age | Commit message (Collapse) | Author |
|
virtio,pc,pci: features, cleanups, fixes
vhost-user enabled on non-linux systems
beginning of nvme sriov support
bigger tx queue for vdpa
virtio iommu bypass
FADT flag to detect legacy keyboards
Fixes, cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Mon 07 Mar 2022 22:43:31 GMT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream: (47 commits)
hw/acpi/microvm: turn on 8042 bit in FADT boot architecture flags if present
tests/acpi: i386: update FACP table differences
hw/acpi: add indication for i8042 in IA-PC boot flags of the FADT table
tests/acpi: i386: allow FACP acpi table changes
docs: vhost-user: add subsection for non-Linux platforms
configure, meson: allow enabling vhost-user on all POSIX systems
vhost: use wfd on functions setting vring call fd
event_notifier: add event_notifier_get_wfd()
pci: drop COMPAT_PROP_PCP for 2.0 machine types
hw/smbios: Add table 4 parameter, "processor-id"
x86: cleanup unused compat_apic_id_mode
vhost-vsock: detach the virqueue element in case of error
pc: add option to disable PS/2 mouse/keyboard
acpi: pcihp: pcie: set power on cap on parent slot
pci: expose TYPE_XIO3130_DOWNSTREAM name
pci: show id info when pci BDF conflict
hw/misc/pvpanic: Use standard headers instead
headers: Add pvpanic.h
pci-bridge/xio3130_downstream: Fix error handling
pci-bridge/xio3130_upstream: Fix error handling
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# docs/specs/index.rst
|
|
'remotes/pmaydell/tags/pull-target-arm-20220307' into staging
target-arm queue:
* cleanups of qemu_oom_check() and qemu_memalign()
* target/arm/translate-neon: UNDEF if VLD1/VST1 stride bits are non-zero
* target/arm/translate-neon: Simplify align field check for VLD3
* GICv3 ITS: add more trace events
* GICv3 ITS: implement 8-byte accesses properly
* GICv3: fix minor issues with some trace/log messages
* ui/cocoa: Use the standard about panel
* target/arm: Provide cpu property for controling FEAT_LPA2
* hw/arm/virt: Disable LPA2 for -machine virt-6.2
# gpg: Signature made Mon 07 Mar 2022 16:46:06 GMT
# gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg: issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20220307:
hw/arm/virt: Disable LPA2 for -machine virt-6.2
target/arm: Provide cpu property for controling FEAT_LPA2
ui/cocoa: Use the standard about panel
hw/intc/arm_gicv3_cpuif: Fix register names in ICV_HPPIR read trace event
hw/intc/arm_gicv3: Fix missing spaces in error log messages
hw/intc/arm_gicv3: Specify valid and impl in MemoryRegionOps
hw/intc/arm_gicv3_its: Add trace events for table reads and writes
hw/intc/arm_gicv3_its: Add trace events for commands
target/arm/translate-neon: Simplify align field check for VLD3
target/arm/translate-neon: UNDEF if VLD1/VST1 stride bits are non-zero
osdep: Move memalign-related functions to their own header
util: Put qemu_vfree() in memalign.c
util: Use meson checks for valloc() and memalign() presence
util: Share qemu_try_memalign() implementation between POSIX and Windows
meson.build: Don't misdetect posix_memalign() on Windows
util: Return valid allocation for qemu_try_memalign() with zero size
util: Unify implementations of qemu_memalign()
util: Make qemu_oom_check() a static function
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
|
|
into staging
9pfs: introduce macOS host support and cleanup
* Add support for Darwin (a.k.a. macOS) hosts.
* Code cleanup (move qemu_dirent_dup() from osdep -> 9p-util).
* API doc cleanup (convert Doxygen -> kerneldoc format).
# gpg: Signature made Mon 07 Mar 2022 11:14:45 GMT
# gpg: using RSA key 96D8D110CF7AF8084F88590134C2B58765A47395
# gpg: issuer "qemu_oss@crudebyte.com"
# gpg: Good signature from "Christian Schoenebeck <qemu_oss@crudebyte.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: ECAB 1A45 4014 1413 BA38 4926 30DB 47C3 A012 D5F4
# Subkey fingerprint: 96D8 D110 CF7A F808 4F88 5901 34C2 B587 65A4 7395
* remotes/cschoenebeck/tags/pull-9p-20220307:
fsdev/p9array.h: convert Doxygen -> kerneldoc format
9pfs/coth.h: drop Doxygen format on v9fs_co_run_in_worker()
9pfs/9p-util.h: convert Doxygen -> kerneldoc format
9pfs/9p.c: convert Doxygen -> kerneldoc format
9pfs/codir.c: convert Doxygen -> kerneldoc format
9pfs/9p.h: convert Doxygen -> kerneldoc format
9pfs: drop Doxygen format from qemu_dirent_dup() API comment
9pfs: move qemu_dirent_dup() from osdep -> 9p-util
9p: darwin: meson: Allow VirtFS on Darwin
9p: darwin: Adjust assumption on virtio-9p-test
9p: darwin: Implement compatibility for mknodat
9p: darwin: Compatibility for f/l*xattr
9p: darwin: *xattr_nofollow implementations
9p: darwin: Move XATTR_SIZE_MAX->P9_XATTR_SIZE_MAX
9p: darwin: Ignore O_{NOATIME, DIRECT}
9p: darwin: Handle struct dirent differences
9p: darwin: Handle struct stat(fs) differences
9p: Rename 9p-util -> 9p-util-linux
9p: linux: Fix a couple Linux assumptions
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
|
|
Move the various memalign-related functions out of osdep.h and into
their own header, which we include only where they are used.
While we're doing this, add some brief documentation comments.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org
|
|
qemu_vfree() is the companion free function to qemu_memalign(); put
it in memalign.c so the allocation and free functions are together.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220226180723.1706285-9-peter.maydell@linaro.org
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
|
|
Instead of assuming that all CONFIG_BSD have valloc() and anything
else is memalign(), explicitly check for those functions in
meson.build and use the "is the function present" define. Tests for
specific functionality are better than which-OS checks; this also
lets us give a helpful error message if somehow there's no usable
function present.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20220226180723.1706285-8-peter.maydell@linaro.org
|
|
The qemu_try_memalign() functions for POSIX and Windows used to be
significantly different, but these days they are identical except for
the actual allocation function called, and the POSIX version already
has to have ifdeffery for different allocation functions.
Move to a single implementation in memalign.c, which uses the Windows
_aligned_malloc if we detect that function in meson.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220226180723.1706285-7-peter.maydell@linaro.org
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
|
|
Currently qemu_try_memalign()'s behaviour if asked to allocate
0 bytes is rather variable:
* on Windows, we will assert
* on POSIX platforms, we get the underlying behaviour of
the posix_memalign() or equivalent function, which may be
either "return a valid non-NULL pointer" or "return NULL"
Explictly check for 0 byte allocations, so we get consistent
behaviour across platforms. We handle them by incrementing the size
so that we return a valid non-NULL pointer that can later be passed
to qemu_vfree(). This is permitted behaviour for the
posix_memalign() API and is the most usual way that underlying
malloc() etc implementations handle a zero-sized allocation request,
because it won't trip up calling code that assumes NULL means an
error. (This includes our own qemu_memalign(), which will abort on
NULL.)
This change is a preparation for sharing the qemu_try_memalign() code
between Windows and POSIX.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
|
|
We implement qemu_memalign() in both oslib-posix.c and oslib-win32.c,
but the two versions are essentially the same: they call
qemu_try_memalign(), and abort() after printing an error message if
it fails. The only difference is that the win32 version prints the
GetLastError() value whereas the POSIX version prints
strerror(errno). However, this is a bug in the win32 version: in
commit dfbd0b873a85021 in 2020 we changed the implementation of
qemu_try_memalign() from using VirtualAlloc() (which sets the
GetLastError() value) to using _aligned_malloc() (which sets errno),
but didn't update the error message to match.
Replace the two separate functions with a single version in a
new memalign.c file, which drops the unnecessary extra qemu_oom_check()
function and instead prints a more useful message including the
requested size and alignment as well as the errno string.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220226180723.1706285-4-peter.maydell@linaro.org
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
|
|
The qemu_oom_check() function, which we define in both oslib-posix.c
and oslib-win32.c, is now used only locally in that file; make it
static.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20220226180723.1706285-3-peter.maydell@linaro.org
|
|
Function qemu_dirent_dup() is currently only used by 9pfs server, so move
it from project global header osdep.h to 9pfs specific header 9p-util.h.
Link: https://lore.kernel.org/qemu-devel/CAFEAcA_=HAUNomKD2wurSVaAHa5mrk22A1oHKLWUDjk7v6Khmg@mail.gmail.com/
Based-on: <20220227223522.91937-12-wwcohen@gmail.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <E1nP9Oz-00043L-KJ@lizzy.crudebyte.com>
|
|
Add a convenient function similar with bdrv_block_status() to get
status of dirty bitmap.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-9-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
|
|
event_notifier_get_fd(const EventNotifier *e) always returns
EventNotifier's read file descriptor (rfd). This is not a problem when
the EventNotifier is backed by a an eventfd, as a single file
descriptor is used both for reading and triggering events (rfd ==
wfd).
But, when EventNotifier is backed by a pipe pair, we have two file
descriptors, one that can only be used for reads (rfd), and the other
only for writes (wfd).
There's, at least, one known situation in which we need to obtain wfd
instead of rfd, which is when setting up the file that's going to be
sent to the peer in vhost's SET_VRING_CALL.
Add a new event_notifier_get_wfd(const EventNotifier *e) that can be
used to obtain wfd where needed.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220304100854.14829-2-slp@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
RCU may be used from coroutines. Standard __thread variables cannot be
used by coroutines. Use the coroutine TLS macros instead.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220222140150.27240-4-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
QEMU TLS macros must be used to make TLS variables safe with coroutines.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220222140150.27240-3-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
According to the grammar, a key __com.redhat_foo would be parsed as
two key fragments __com and redhat_foo. It's actually parsed as a
single fragment. Fix the grammar.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220218145551.892787-2-armbru@redhat.com>
|
|
staging
* More Meson conversions (0.59.x now required rather than suggested)
* UMIP support for TCG x86
* Fix migration crash
* Restore error output for check-block
# gpg: Signature made Mon 21 Feb 2022 09:35:59 GMT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini-gitlab/tags/for-upstream: (29 commits)
configure, meson: move CONFIG_IASL to a Meson option
meson, configure: move ntddscsi API check to meson
meson: require dynamic linking for VSS support
qga/vss-win32: require widl/midl, remove pre-built TLB file
meson: do not make qga/vss-win32/meson.build conditional on C++ presence
configure, meson: replace VSS SDK checks and options with --enable-vss-sdk
qga/vss: use standard windows headers location
qga/vss-win32: use widl if available
meson: drop --with-win-sdk
qga/vss-win32: fix midl arguments
meson: refine check for whether to look for virglrenderer
configure, meson: move guest-agent, tools to meson
configure, meson: move smbd options to meson_options.txt
configure, meson: move coroutine options to meson_options.txt
configure, meson: move some default-disabled options to meson_options.txt
meson: define qemu_cflags/qemu_ldflags
configure, meson: move block layer options to meson_options.txt
configure, meson: move image format options to meson_options.txt
configure, meson: cleanup qemu-ga libraries
configure, meson: move TPM check to meson
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
|
|
The "hardware version" machinery (qemu_set_hw_version(),
qemu_hw_version(), and the QEMU_HW_VERSION define) is used by fewer
than 10 files. Move it out from osdep.h into a new
qemu/hw-version.h.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220208200856.3558249-6-peter.maydell@linaro.org
|
|
The qemu_icache_linesize, qemu_icache_linesize_log,
qemu_dcache_linesize, and qemu_dcache_linesize_log variables are not
used in many files. Move them out of osdep.h to a new
qemu/cacheinfo.h, and document them.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220208200856.3558249-5-peter.maydell@linaro.org
|
|
The qemu_mprotect_*() family of functions are used in very few files;
move them from osdep.h to a new qemu/mprotect.h.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220208200856.3558249-3-peter.maydell@linaro.org
|
|
The function qemu_madvise() and the QEMU_MADV_* constants associated
with it are used in only 10 files. Move them out of osdep.h to a new
qemu/madvise.h header that is included where it is needed.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220208200856.3558249-2-peter.maydell@linaro.org
|
|
`struct dirent' returned from readdir(3) could be shorter (or longer)
than `sizeof(struct dirent)', thus memcpy of sizeof length will overread
into unallocated page causing SIGSEGV. Example stack trace:
#0 0x00005555559ebeed v9fs_co_readdir_many (/usr/bin/qemu-system-x86_64 + 0x497eed)
#1 0x00005555559ec2e9 v9fs_readdir (/usr/bin/qemu-system-x86_64 + 0x4982e9)
#2 0x0000555555eb7983 coroutine_trampoline (/usr/bin/qemu-system-x86_64 + 0x963983)
#3 0x00007ffff73e0be0 n/a (n/a + 0x0)
While fixing this, provide a helper for any future `struct dirent' cloning.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/841
Cc: qemu-stable@nongnu.org
Co-authored-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Acked-by: Greg Kurz <groug@kaod.org>
Tested-by: Vitaly Chikunov <vt@altlinux.org>
Message-Id: <20220216181821.3481527-1-vt@altlinux.org>
[C.S. - Fix typo in source comment. ]
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
|
|
The test is a bit different from the others, in that it does not run
if $membarrier is empty. For meson, the default can simply be disabled;
if one day we will toggle the default, no change is needed in meson.build.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Coroutine pool size was 64 from long ago, and the basis was organized in the commit message in 4d68e86b.
At that time, virtio-blk queue-size and num-queue were not configuable, and equivalent values were 128 and 1.
Coroutine pool size 64 was fine then.
Later queue-size and num-queue got configuable, and default values were increased.
Coroutine pool with size 64 exhausts frequently with random disk IO in new size, and slows down.
This commit adjusts coroutine pool size adaptively with new values.
This commit adds 64 by default, but now coroutine is not only for block devices,
and is not too much burdon comparing with new default.
pool size of 128 * vCPUs.
Signed-off-by: Hiroki Narukawa <hnarukaw@yahoo-corp.jp>
Message-id: 20220214115302.13294-2-hnarukaw@yahoo-corp.jp
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|
|
We're missing an unlock in case installing the signal handler failed.
Fortunately, we barely see this error in real life.
Fixes: a960d6642d39 ("util/oslib-posix: Support concurrent os_mem_prealloc() invocation")
Fixes: CID 1468941
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta@ionos.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20220111120830.119912-1-david@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
__get_cpuid_max returns an unsigned value.
For consistency, store the result in an unsigned variable.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
|
|
The vhost-user-blk export runs requests asynchronously in their own
coroutine. When the vhost connection goes away and we want to stop the
vhost-user server, we need to wait for these coroutines to stop before
we can unmap the shared memory. Otherwise, they would still access the
unmapped memory and crash.
This introduces a refcount to VuServer which is increased when spawning
a new request coroutine and decreased before the coroutine exits. The
memory is only unmapped when the refcount reaches zero.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220125151435.48792-1-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
'remotes/stefanha-gitlab/tags/block-pull-request' into staging
Pull request
# gpg: Signature made Wed 12 Jan 2022 17:13:54 GMT
# gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha-gitlab/tags/block-pull-request:
virtio: unify dataplane and non-dataplane ->handle_output()
virtio: use ->handle_output() instead of ->handle_aio_output()
virtio-scsi: prepare virtio_scsi_handle_cmd for dataplane
virtio-blk: drop unused virtio_blk_handle_vq() return value
virtio: get rid of VirtIOHandleAIOOutput
aio-posix: split poll check from ready handler
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
|
|
Adaptive polling measures the execution time of the polling check plus
handlers called when a polled event becomes ready. Handlers can take a
significant amount of time, making it look like polling was running for
a long time when in fact the event handler was running for a long time.
For example, on Linux the io_submit(2) syscall invoked when a virtio-blk
device's virtqueue becomes ready can take 10s of microseconds. This
can exceed the default polling interval (32 microseconds) and cause
adaptive polling to stop polling.
By excluding the handler's execution time from the polling check we make
the adaptive polling calculation more accurate. As a result, the event
loop now stays in polling mode where previously it would have fallen
back to file descriptor monitoring.
The following data was collected with virtio-blk num-queues=2
event_idx=off using an IOThread. Before:
168k IOPS, IOThread syscalls:
9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0) = 16
9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8) = 8
9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8) = 8
9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3
9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512) = 8
9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512) = 8
9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512) = 8
9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0) = 32
174k IOPS (+3.6%), IOThread syscalls:
9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0) = 32
9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8) = 8
9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8) = 8
9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50) = 32
Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because
the IOThread stays in polling mode instead of falling back to file
descriptor monitoring.
As usual, polling is not implemented on Windows so this patch ignores
the new io_poll_read() callback in aio-win32.c.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20211207132336.36627-2-stefanha@redhat.com
[Fixed up aio_set_event_notifier() calls in
tests/unit/test-fdmon-epoll.c added after this series was queued.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|
|
Reenable util/filemonitor-inotify compilation. Compilation was
disabled when commit a620fbe9ac ("configure: convert compiler tests
to meson, part 5") moved CONFIG_INOTIFY1 from config-host.mak to
config-host.h.
This fixes the usb-mtp device and reenables test-util-filemonitor.
Fixes: a620fbe9ac ("configure: convert compiler tests to meson, part 5")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/800
Signed-off-by: Volker Rümelin <vr_qemu@t-online.de>
Message-Id: <20220107133514.7785-1-vr_qemu@t-online.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Addition of div and rem on 128-bit integers, using the 128/64->128 divu and
64x64->128 mulu in host-utils.
These operations will be used within div/rem helpers in the 128-bit riscv
target.
Signed-off-by: Frédéric Pétrot <frederic.petrot@univ-grenoble-alpes.fr>
Co-authored-by: Fabien Portas <fabien.portas@grenoble-inp.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20220106210108.138226-4-frederic.petrot@univ-grenoble-alpes.fr
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
|
|
Temporarily modifying the SIGBUS handler is really nasty, as we might be
unlucky and receive an MCE SIGBUS while having our handler registered.
Unfortunately, there is no way around messing with SIGBUS when
MADV_POPULATE_WRITE is not applicable or not around.
Let's forward SIGBUS that don't belong to us to the already registered
handler and document the situation.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211217134611.31172-8-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Add a mutex to protect the SIGBUS case, as we cannot mess concurrently
with the sigbus handler and we have to manage the global variable
sigbus_memset_context. The MADV_POPULATE_WRITE path can run
concurrently.
Note that page_mutex and page_cond are shared between concurrent
invocations, which shouldn't be a problem.
This is a preparation for future virtio-mem prealloc code, which will call
os_mem_prealloc() asynchronously from an iothread when handling guest
requests.
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211217134611.31172-7-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Let's simplify the case when we only want a single thread and don't have
to mess with signal handlers.
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211217134611.31172-6-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
pages
Let's limit the number of threads to something sane, especially that
- We don't have more threads than the number of pages we have
- We don't have threads that initialize small (< 64 MiB) memory
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211217134611.31172-5-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Let's minimize the number of global variables to prepare for
os_mem_prealloc() getting called concurrently and make the code a bit
easier to read.
The only consumer that really needs a global variable is the sigbus
handler, which will require protection via a mutex in the future either way
as we cannot concurrently mess with the SIGBUS handler.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211217134611.31172-4-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Let's sense support and use it for preallocation. MADV_POPULATE_WRITE
does not require a SIGBUS handler, doesn't actually touch page content,
and avoids context switches; it is, therefore, faster and easier to handle
than our current approach.
While MADV_POPULATE_WRITE is, in general, faster than manual
prefaulting, and especially faster with 4k pages, there is still value in
prefaulting using multiple threads to speed up preallocation.
More details on MADV_POPULATE_WRITE can be found in the Linux commits
4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault
page tables") and eb2faa513c24 ("mm/madvise: report SIGBUS as -EFAULT for
MADV_POPULATE_(READ|WRITE)"), and in the man page proposal [1].
This resolves the TODO in do_touch_pages().
In the future, we might want to look into using fallocate(), eventually
combined with MADV_POPULATE_READ, when dealing with shared file/fd
mappings and not caring about memory bindings.
[1] https://lkml.kernel.org/r/20210816081922.5155-1-david@redhat.com
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211217134611.31172-3-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Let's prepare touch_all_pages() for returning differing errors. Return
an error from the thread and report the last processed error.
Translate SIGBUS to -EFAULT, as a SIGBUS can mean all different kind of
things (memory error, read error, out of memory). When allocating memory
fails via the current SIGBUS-based mechanism, we'll get:
os_mem_prealloc: preallocating memory failed: Bad address
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211217134611.31172-2-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Invoke the transaction drivers' .clean() methods only after all
.commit() or .abort() handlers are done.
This makes it easier to have nested transactions where the top-level
transactions pass objects to lower transactions that the latter can
still use throughout their commit/abort phases, while the top-level
transaction keeps a reference that is released in its .clean() method.
(Before this commit, that is also possible, but the top-level
transaction would need to take care to invoke tran_add() before the
lower-level transaction does. This commit makes the ordering
irrelevant, which is just a bit nicer.)
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20211111120829.81329-8-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-8-kwolf@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
|
|
The drain_rcu_call() function can be blocked as long as an RCU reader
stays in a read-side critical section. This is typically what happens
when a TCG vCPU is executing a busy loop. It can deadlock the QEMU
monitor as reported in https://gitlab.com/qemu-project/qemu/-/issues/650 .
This can be avoided by allowing drain_rcu_call() to enforce an RCU grace
period. Since each reader might need to do specific actions to end a
read-side critical section, do it with notifiers.
Prepare ground for this by adding a notifier list to the RCU reader
struct and use it in wait_for_readers() if drain_rcu_call() is in
progress. An API is added for readers to register their notifiers.
This is largely based on a draft from Paolo Bonzini.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211109183523.47726-2-groug@kaod.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As qemu guidelines:
Unless a pointer is used to modify the pointed-to storage, give it the
"const" attribute.
In the particular case of iova_tree_find it allows to enforce what is
requested by its comment, since the compiler would shout in case of
modifying or freeing the const-qualified returned pointer.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211013182713.888753-2-eperezma@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
These will be used to implement new decimal floating point
instructions from Power ISA 3.1.
The remainder is now returned directly by divu128/divs128,
freeing up phigh to receive the high 64 bits of the quotient.
Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211025191154.350831-4-luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
In preparation for changing the divu128/divs128 implementations
to allow for quotients larger than 64 bits, move the div-by-zero
and overflow checks to the callers.
Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211025191154.350831-2-luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Use QTAILQ_FOREACH_SAFE() so that the current QemuOpts can be deleted
while iterating through the whole list.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211008133442.141332-11-kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20211007130829.632254-15-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This makes the pthreads check dead in configure, so remove it
as well.
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20211007130829.632254-9-pbonzini@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This allows the use of native signalfd instead of the sigtimedwait
based emulation on systems other than Linux.
Signed-off-by: Kacper Słomiński <kacper.slominski72@gmail.com>
Message-Id: <20210905011621.200785-1-kacper.slominski72@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
staging
* SGX implementation for x86
* Miscellaneous bugfixes
* Fix dependencies from ROMs to qtests
# gpg: Signature made Thu 30 Sep 2021 14:30:35 BST
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini-gitlab/tags/for-upstream: (33 commits)
meson_options.txt: Switch the default value for the vnc option to 'auto'
build-sys: add HAVE_IPPROTO_MPTCP
memory: Add tracepoint for dirty sync
memory: Name all the memory listeners
target/i386: Fix memory leak in sev_read_file_base64()
tests: qtest: bios-tables-test depends on the unpacked edk2 ROMs
meson: unpack edk2 firmware even if --disable-blobs
target/i386: Add the query-sgx-capabilities QMP command
target/i386: Add HMP and QMP interfaces for SGX
docs/system: Add SGX documentation to the system manual
sgx-epc: Add the fill_device_info() callback support
i440fx: Add support for SGX EPC
q35: Add support for SGX EPC
i386: acpi: Add SGX EPC entry to ACPI tables
i386/pc: Add e820 entry for SGX EPC section(s)
hw/i386/pc: Account for SGX EPC sections when calculating device memory
hw/i386/fw_cfg: Set SGX bits in feature control fw_cfg accordingly
Adjust min CPUID level to 0x12 when SGX is enabled
i386: Propagate SGX CPUID sub-leafs to KVM
i386: kvm: Add support for exposing PROVISIONKEY to guest
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
|
|
The QAPI schema shouldn't rely on C system headers #define, but on
configure-time project #define, so we can express the build condition in
a C-independent way.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210907121943.3498701-3-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The previous code didn't detect overflows if the high 64-bit
of the dividend were equal to the 64-bit divisor. In that case,
64 bits wouldn't be enough to hold the quotient.
Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210910112624.72748-2-luis.pires@eldorado.org.br>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
|