aboutsummaryrefslogtreecommitdiff
path: root/util
AgeCommit message (Collapse)Author
2017-03-15os: don't corrupt pre-existing memory-backend data with preallocDaniel P. Berrange
When using a memory-backend object with prealloc turned on, QEMU will memset() the first byte in every memory page to zero. While this might have been acceptable for memory backends associated with RAM, this corrupts application data for NVDIMMs. Instead of setting every page to zero, read the current byte value and then just write that same value back, so we are not corrupting the original data. Directly write the value instead of memset()ing it, since there's no benefit to memset for a single byte write. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Message-id: 20170303113255.28262-1-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-03-14icount: process QEMU_CLOCK_VIRTUAL timers in vCPU threadPaolo Bonzini
icount has become much slower after tcg_cpu_exec has stopped using the BQL. There is also a latent bug that is masked by the slowness. The slowness happens because every occurrence of a QEMU_CLOCK_VIRTUAL timer now has to wake up the I/O thread and wait for it. The rendez-vous is mediated by the BQL QemuMutex: - handle_icount_deadline wakes up the I/O thread with BQL taken - the I/O thread wakes up and waits on the BQL - the VCPU thread releases the BQL a little later - the I/O thread raises an interrupt, which calls qemu_cpu_kick - the VCPU thread notices the interrupt, takes the BQL to process it and waits on it All this back and forth is extremely expensive, causing a 6 to 8-fold slowdown when icount is turned on. One may think that the issue is that the VCPU thread is too dependent on the BQL, but then the latent bug comes in. I first tried removing the BQL completely from the x86 cpu_exec, only to see everything break. The only way to fix it (and make everything slow again) was to add a dummy BQL lock/unlock pair. This is because in -icount mode you really have to process the events before the CPU restarts executing the next instruction. Therefore, this series moves the processing of QEMU_CLOCK_VIRTUAL timers straight in the vCPU thread when running in icount mode. The required changes include: - make the timer notification callback wake up TCG's single vCPU thread when run from another thread. By using async_run_on_cpu, the callback can override all_cpu_threads_idle() when the CPU is halted. - move handle_icount_deadline after qemu_tcg_wait_io_event, so that the timer notification callback is invoked after the dummy work item wakes up the vCPU thread - make handle_icount_deadline run the timers instead of just waking the I/O thread. - stop processing the timers in the main loop Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14cpus: define QEMUTimerListNotifyCB for QEMU system emulationPaolo Bonzini
There is no change for now, because the callback just invokes qemu_notify_event. Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14qemu-timer: do not include sysemu/cpus.h from util/qemu-timer.hPaolo Bonzini
This dependency is the wrong way, and we will need util/qemu-timer.h from sysemu/cpus.h in the next patch. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14qemu-timer: fix off-by-onePaolo Bonzini
If the first timer is exactly at the current value of the clock, the deadline is met and the timer should fire. This fixes itself on the next iteration of the loop without icount; with icount, however, execution of instructions will stop exactly at the deadline and won't proceed. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14util: Removed unneeded header from path.cSuramya Shah
Signed-off-by: Suramya Shah <shah.suramya@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20170310163948.7567-1-shah.suramya@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-14mem-prealloc: reduce large guest start-up and migration time.Jitendra Kolhe
Using "-mem-prealloc" option for a large guest leads to higher guest start-up and migration time. This is because with "-mem-prealloc" option qemu tries to map every guest page (create address translations), and make sure the pages are available during runtime. virsh/libvirt by default, seems to use "-mem-prealloc" option in case the guest is configured to use huge pages. The patch tries to map all guest pages simultaneously by spawning multiple threads. Currently limiting the change to QEMU library functions on POSIX compliant host only, as we are not sure if the problem exists on win32. Below are some stats with "-mem-prealloc" option for guest configured to use huge pages. ------------------------------------------------------------------------ Idle Guest | Start-up time | Migration time ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - single threaded (existing code) ------------------------------------------------------------------------ 64 Core - 4TB | 54m11.796s | 75m43.843s 64 Core - 1TB | 8m56.576s | 14m29.049s 64 Core - 256GB | 2m11.245s | 3m26.598s ------------------------------------------------------------------------ Guest stats with 2M HugePage usage - map guest pages using 8 threads ------------------------------------------------------------------------ 64 Core - 4TB | 5m1.027s | 34m10.565s 64 Core - 1TB | 1m10.366s | 8m28.188s 64 Core - 256GB | 0m19.040s | 2m10.148s ----------------------------------------------------------------------- Guest stats with 2M HugePage usage - map guest pages using 16 threads ----------------------------------------------------------------------- 64 Core - 4TB | 1m58.970s | 31m43.400s 64 Core - 1TB | 0m39.885s | 7m55.289s 64 Core - 256GB | 0m11.960s | 2m0.135s ----------------------------------------------------------------------- Changed in v2: - modify number of memset threads spawned to min(smp_cpus, 16). - removed 64GB memory restriction for spawning memset threads. Changed in v3: - limit number of threads spawned based on min(sysconf(_SC_NPROCESSORS_ONLN), 16, smp_cpus) - implement memset thread specific siglongjmp in SIGBUS signal_handler. Changed in v4 - remove sigsetjmp/siglongjmp and SIGBUS unblock/block for main thread as main thread no longer touches any pages. - simplify code my returning memset_thread_failed status from touch_all_pages. Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com> Message-Id: <1487907103-32350-1-git-send-email-jitendra.kolhe@hpe.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-07keyval: Support listsMarkus Armbruster
Additionally permit non-negative integers as key components. A dictionary's keys must either be all integers or none. If all keys are integers, convert the dictionary to a list. The set of keys must be [0,N]. Examples: * list.1=goner,list.0=null,list.1=eins,list.2=zwei is equivalent to JSON [ "null", "eins", "zwei" ] * a.b.c=1,a.b.0=2 is inconsistent: a.b.c clashes with a.b.0 * list.0=null,list.2=eins,list.2=zwei has a hole: list.1 is missing Similar design flaw as for objects: there is no way to denote an empty list. While interpreting "key absent" as empty list seems natural (removing a list member from the input string works when there are multiple ones, so why not when there's just one), it doesn't work: "key absent" already means "optional list absent", which isn't the same as "empty list present". Update the keyval object visitor to use this a.0 syntax in error messages rather than the usual a[0]. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1488317230-26248-25-git-send-email-armbru@redhat.com> [Off-by-one fix squashed in, as per Kevin's review] Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2017-03-07keyval: Restrict key components to valid QAPI namesMarkus Armbruster
Until now, key components are separated by '.'. This leaves little room for evolving the syntax, and is incompatible with the __RFQDN_ prefix convention for downstream extensions. Since key components will be commonly used as QAPI member names by the QObject input visitor, we can just as well borrow the QAPI naming rules here: letters, digits, hyphen and period starting with a letter, with an optional __RFQDN_ prefix for downstream extensions. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <1488317230-26248-20-git-send-email-armbru@redhat.com>
2017-03-07keyval: New keyval_parse()Markus Armbruster
keyval_parse() parses KEY=VALUE,... into a QDict. Works like qemu_opts_parse(), except: * Returns a QDict instead of a QemuOpts (d'oh). * Supports nesting, unlike QemuOpts: a KEY is split into key fragments at '.' (dotted key convention; the block layer does something similar on top of QemuOpts). The key fragments are QDict keys, and the last one's value is updated to VALUE. * Each key fragment may be up to 127 bytes long. qemu_opts_parse() limits the entire key to 127 bytes. * Overlong key fragments are rejected. qemu_opts_parse() silently truncates them. * Empty key fragments are rejected. qemu_opts_parse() happily accepts empty keys. * It does not store the returned value. qemu_opts_parse() stores it in the QemuOptsList. * It does not treat parameter "id" specially. qemu_opts_parse() ignores all but the first "id", and fails when its value isn't id_wellformed(), or duplicate (a QemuOpts with the same ID is already stored). It also screws up when a value contains ",id=". * Implied value is not supported. qemu_opts_parse() desugars "foo" to "foo=on", and "nofoo" to "foo=off". * An implied key's value can't be empty, and can't contain ','. I intend to grow this into a saner replacement for QemuOpts. It'll take time, though. Note: keyval_parse() provides no way to do lists, and its key syntax is incompatible with the __RFQDN_ prefix convention for downstream extensions, because it blindly splits at '.', even in __RFQDN_. Both issues will be addressed later in the series. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1488317230-26248-4-git-send-email-armbru@redhat.com>
2017-03-04Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170303' ↵Peter Maydell
into staging ppc patch queuye for 2017-03-03 This will probably be my last pull request before the hard freeze. It has some new work, but that has all been posted in draft before the soft freeze, so I think it's reasonable to include in qemu-2.9. This batch has: * A substantial amount of POWER9 work * Implements the legacy (hash) MMU for POWER9 * Some more preliminaries for implementing the POWER9 radix MMU * POWER9 has_work * Basic POWER9 compatibility mode handling * Removal of some premature tests * Some cleanups and fixes to the existing MMU code to make the POWER9 work simpler * A bugfix for TCG multiply adds on power * Allow pseries guests to access PCIe extended config space This also includes a code-motion not strictly in ppc code - moving getrampagesize() from ppc code to exec.c. This will make some future VFIO improvements easier, Paolo said it was ok to merge via my tree. # gpg: Signature made Fri 03 Mar 2017 03:20:36 GMT # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.9-20170303: target/ppc: rewrite f[n]m[add,sub] using float64_muladd spapr: Small cleanup of PPC MMU enums spapr_pci: Advertise access to PCIe extended config space target/ppc: Rework hash mmu page fault code and add defines for clarity target/ppc: Move no-execute and guarded page checking into new function target/ppc: Add execute permission checking to access authority check target/ppc: Add Instruction Authority Mask Register Check hw/ppc/spapr: Add POWER9 to pseries cpu models target/ppc/POWER9: Add cpu_has_work function for POWER9 target/ppc/POWER9: Add POWER9 pa-features definition target/ppc/POWER9: Add POWER9 mmu fault handler target/ppc: Don't gen an SDR1 on POWER9 and rework register creation target/ppc: Add patb_entry to sPAPRMachineState target/ppc/POWER9: Add POWERPC_MMU_V3 bit powernv: Don't test POWER9 CPU yet exec, kvm, target-ppc: Move getrampagesize() to common code target/ppc: Add POWER9/ISAv3.00 to compat_table Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-03cpus: remove ugly cast on sigbus_handlerPaolo Bonzini
The cast is there because sigbus_handler is invoked via sigfd_handler. But it feels just wrong to use struct qemu_signalfd_siginfo in the prototype of a function that is passed to sigaction. Instead, do a simple-minded conversion of qemu_signalfd_siginfo to siginfo_t. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-03-03exec, kvm, target-ppc: Move getrampagesize() to common codeAlexey Kardashevskiy
getrampagesize() returns the largest supported page size and mainly used to know if huge pages are enabled. However is implemented in target-ppc/kvm.c and not available in TCG or other architectures. This renames and moves gethugepagesize() to mmap-alloc.c where fd-based analog of it is already implemented. This renames and moves getrampagesize() to exec.c as it seems to be the common place for helpers like this. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-02Merge remote-tracking branch 'remotes/elmarco/tags/leak-pull-request' into ↵Peter Maydell
staging # gpg: Signature made Wed 01 Mar 2017 09:02:53 GMT # gpg: using RSA key 0xDAE8E10975969CE5 # gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" # gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5 * remotes/elmarco/tags/leak-pull-request: (28 commits) tests: fix virtio-blk-test leaks tests: add specialized device_find function tests: fix usb-test leaks tests: allows to run single test in usb-hcd-ehci-test usb: release the created buses bus: do not unref hotplug handler tests: fix virtio-9p-test leaks tests: fix virtio-scsi-test leak tests: fix e1000e leaks tests: fix i440fx-test leaks tests: fix e1000-test leak tests: fix tco-test leaks tests: fix eepro100-test leak pc: pcihp: avoid adding ACPI_PCIHP_PROP_BSEL twice tests: fix ipmi-bt-test leak tests: fix ipmi-kcs-test leak tests: fix bios-tables-test leak tests: fix hd-geo-test leaks tests: fix ide-test leaks tests: fix vhost-user-test leaks ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-03-01timer: use an inline function for freeMarc-André Lureau
Similarly to allocation, do it from an inline function. This allows tests to only use the headers for allocation/free of timer. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-28option: Tweak invalid size error message and unbreak iotest 049Markus Armbruster
Commit 75cdcd1 neglected to update tests/qemu-iotests/049.out, and made the error message for negative size worse. Fix that. Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-02-23option: Fix checking of sizes for overflow and trailing crapMarkus Armbruster
parse_option_size()'s checking for overflow and trailing crap is wrong. Has always been that way. qemu_strtosz() gets it right, so use that. This adds support for size suffixes 'P', 'E', and ignores case for all suffixes, not just 'k'. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-25-git-send-email-armbru@redhat.com>
2017-02-23util/cutils: Change qemu_strtosz*() from int64_t to uint64_tMarkus Armbruster
This will permit its use in parse_option_size(). Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86) Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core) Cc: Max Reitz <mreitz@redhat.com> (supporter:Block layer core) Cc: qemu-block@nongnu.org (open list:Block layer core) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1487708048-2131-24-git-send-email-armbru@redhat.com>
2017-02-23util/cutils: Return qemu_strtosz*() error and value separatelyMarkus Armbruster
This makes qemu_strtosz(), qemu_strtosz_mebi() and qemu_strtosz_metric() similar to qemu_strtoi64(), except negative values are rejected. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86) Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core) Cc: Max Reitz <mreitz@redhat.com> (supporter:Block layer core) Cc: qemu-block@nongnu.org (open list:Block layer core) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1487708048-2131-23-git-send-email-armbru@redhat.com>
2017-02-23util/cutils: Let qemu_strtosz*() optionally reject trailing crapMarkus Armbruster
Change the qemu_strtosz() & friends to return -EINVAL when @endptr is null and the conversion doesn't consume the string completely. Matches how qemu_strtol() & friends work. Only test_qemu_strtosz_simple() passes a null @endptr. No functional change there, because its conversion consumes the string. Simplify callers that use @endptr only to fail when it doesn't point to '\0' to pass a null @endptr instead. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86) Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core) Cc: Max Reitz <mreitz@redhat.com> (supporter:Block layer core) Cc: qemu-block@nongnu.org (open list:Block layer core) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1487708048-2131-22-git-send-email-armbru@redhat.com>
2017-02-23util/cutils: Drop QEMU_STRTOSZ_DEFSUFFIX_* macrosMarkus Armbruster
Writing QEMU_STRTOSZ_DEFSUFFIX_* instead of '*' gains nothing. Get rid of these eyesores. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1487708048-2131-18-git-send-email-armbru@redhat.com>
2017-02-23util/cutils: New qemu_strtosz()Markus Armbruster
Most callers of qemu_strtosz_suffix() pass QEMU_STRTOSZ_DEFSUFFIX_B. Capture the pattern in new qemu_strtosz(). Inline qemu_strtosz_suffix() into its only remaining caller. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-17-git-send-email-armbru@redhat.com>
2017-02-23util/cutils: Rename qemu_strtosz() to qemu_strtosz_MiB()Markus Armbruster
With qemu_strtosz(), no suffix means mebibytes. It's used rarely. I'm going to add a similar function where no suffix means bytes. Rename qemu_strtosz() to qemu_strtosz_MiB() to make the name qemu_strtosz() available for the new function. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-16-git-send-email-armbru@redhat.com>
2017-02-23util/cutils: New qemu_strtosz_metric()Markus Armbruster
To parse numbers with metric suffixes, we use qemu_strtosz_suffix_unit(nptr, &eptr, QEMU_STRTOSZ_DEFSUFFIX_B, 1000) Capture this in a new function for legibility: qemu_strtosz_metric(nptr, &eptr) Replace test_qemu_strtosz_suffix_unit() by test_qemu_strtosz_metric(). Rename qemu_strtosz_suffix_unit() to do_strtosz() and give it internal linkage. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-15-git-send-email-armbru@redhat.com>
2017-02-23option: Fix to reject invalid and overflowing numbersMarkus Armbruster
parse_option_number() fails to check for these errors after strtoull(). Has always been broken. Fix that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-10-git-send-email-armbru@redhat.com>
2017-02-23util/cutils: Clean up control flow around qemu_strtol() a bitMarkus Armbruster
Reorder check_strtox_error() to make it obvious that we always store through a non-null @endptr. Transform if (some error) { error case ... err = value for error case; } else { normal case ... err = value for normal case; } return err; to if (some error) { error case ... return value for error case; } normal case ... return value for normal case; Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-9-git-send-email-armbru@redhat.com>
2017-02-23util/cutils: Clean up variable names around qemu_strtol()Markus Armbruster
Name same things the same, different things differently. * qemu_strtol()'s parameter @nptr is called @p in check_strtox_error(). Rename the latter. * qemu_strtol()'s parameter @endptr is called @next in check_strtox_error(). Rename the latter. * qemu_strtol()'s variable @p is called @endptr in check_strtox_error(). Rename both to @ep. * qemu_strtol()'s variable @err is *negative* errno, check_strtox_error()'s parameter @err is *positive*. Rename the latter to @libc_errno. Same for qemu_strtoul(), qemu_strtoi64(), qemu_strtou64(), of course. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-8-git-send-email-armbru@redhat.com>
2017-02-23util/cutils: Rename qemu_strtoll(), qemu_strtoull()Markus Armbruster
The name qemu_strtoll() suggests conversion to long long, but it actually converts to int64_t. Rename to qemu_strtoi64(). The name qemu_strtoull() suggests conversion to unsigned long long, but it actually converts to uint64_t. Rename to qemu_strtou64(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1487708048-2131-7-git-send-email-armbru@redhat.com>
2017-02-23util/cutils: Rewrite documentation of qemu_strtol() & friendsMarkus Armbruster
Fixes the following documentation bugs: * Fails to document that null @nptr is safe. * Fails to document that we return -EINVAL when no conversion could be performed (commit 47d4be1). * Confuses long long with int64_t, and unsigned long long with uint64_t. * Claims the unsigned conversions can underflow. They can't. While there, mark problematic assumptions that int64_t is long long, and uint64_t is unsigned long long with FIXME comments. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1487708048-2131-6-git-send-email-armbru@redhat.com>
2017-02-23option: Assert value string isn't nullMarkus Armbruster
Plenty of code relies on QemuOpt member @str not being null, including qemu_opts_print(), qemu_opts_to_qdict(), and callbacks passed to qemu_opt_foreach(). Begs the question whether it can be null. Only opt_set() creates QemuOpt. It sets member @str to its argument @value. Passing null for @value would plant a time bomb. Callers: * opts_do_parse() can't pass null. * qemu_opt_set() passes its argument @value. Callers: - qemu_opts_from_qdict_1() can't pass null - qemu_opts_set() passes its argument @value, but none of its callers pass null. - Many more outside qemu-option.c, but they shouldn't pass null, either. Assert member @str isn't null, so that misuse is caught right away. Simplify parse_option_bool(), parse_option_number() and parse_option_size() accordingly. Best viewed with whitespace changes ignored. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1487708048-2131-3-git-send-email-armbru@redhat.com>
2017-02-21coroutine-lock: make CoRwlock thread-safe and fairPaolo Bonzini
This adds a CoMutex around the existing CoQueue. Because the write-side can just take CoMutex, the old "writer" field is not necessary anymore. Instead of removing it altogether, count the number of pending writers during a read-side critical section and forbid further readers from entering. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213181244.16297-7-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21coroutine-lock: add mutex argument to CoQueue APIsPaolo Bonzini
All that CoQueue needs in order to become thread-safe is help from an external mutex. Add this to the API. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213181244.16297-6-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21coroutine-lock: add limited spinning to CoMutexPaolo Bonzini
Running a very small critical section on pthread_mutex_t and CoMutex shows that pthread_mutex_t is much faster because it doesn't actually go to sleep. What happens is that the critical section is shorter than the latency of entering the kernel and thus FUTEX_WAIT always fails. With CoMutex there is no such latency but you still want to avoid wait and wakeup. So introduce it artificially. This only works with one waiters; because CoMutex is fair, it will always have more waits and wakeups than a pthread_mutex_t. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213181244.16297-3-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21coroutine-lock: make CoMutex thread-safePaolo Bonzini
This uses the lock-free mutex described in the paper '"Blocking without Locking", or LFTHREADS: A lock-free thread library' by Gidenstam and Papatriantafilou. The same technique is used in OSv, and in fact the code is essentially a conversion to C of OSv's code. [Added missing coroutine_fn in tests/test-aio-multithread.c. --Stefan] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213181244.16297-2-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21async: remove unnecessary inc/dec pairsPaolo Bonzini
Pull the increment/decrement pair out of aio_bh_poll and into the callers. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-18-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21aio-posix: partially inline aio_dispatch into aio_pollPaolo Bonzini
This patch prepares for the removal of unnecessary lockcnt inc/dec pairs. Extract the dispatching loop for file descriptor handlers into a new function aio_dispatch_handlers, and then inline aio_dispatch into aio_poll. aio_dispatch can now become void. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-17-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21block: explicitly acquire aiocontext in aio callbacks that need itPaolo Bonzini
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-16-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21block: explicitly acquire aiocontext in bottom halves that need itPaolo Bonzini
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-15-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21block: explicitly acquire aiocontext in callbacks that need itPaolo Bonzini
This covers both file descriptor callbacks and polling callbacks, since they execute related code. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-14-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21block: explicitly acquire aiocontext in timers that need itPaolo Bonzini
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-13-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21aio: push aio_context_acquire/release down to dispatchingPaolo Bonzini
The AioContext data structures are now protected by list_lock and/or they are walked with FOREACH_RCU primitives. There is no need anymore to acquire the AioContext for the entire duration of aio_dispatch. Instead, just acquire it before and after invoking the callbacks. The next step is then to push it further down. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-12-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21coroutine-lock: reschedule coroutine on the AioContext it was running onPaolo Bonzini
As a small step towards the introduction of multiqueue, we want coroutines to remain on the same AioContext that started them, unless they are moved explicitly with e.g. aio_co_schedule. This patch avoids that coroutines switch AioContext when they use a CoMutex. For now it does not make much of a difference, because the CoMutex is not thread-safe and the AioContext itself is used to protect the CoMutex from concurrent access. However, this is going to change. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-9-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21aio: introduce aio_co_schedule and aio_co_wakePaolo Bonzini
aio_co_wake provides the infrastructure to start a coroutine on a "home" AioContext. It will be used by CoMutex and CoQueue, so that coroutines don't jump from one context to another when they go to sleep on a mutex or waitqueue. However, it can also be used as a more efficient alternative to one-shot bottom halves, and saves the effort of tracking which AioContext a coroutine is running on. aio_co_schedule is the part of aio_co_wake that starts a coroutine on a remove AioContext, but it is also useful to implement e.g. bdrv_set_aio_context callbacks. The implementation of aio_co_schedule is based on a lock-free multiple-producer, single-consumer queue. The multiple producers use cmpxchg to add to a LIFO stack. The consumer (a per-AioContext bottom half) grabs all items added so far, inverts the list to make it FIFO, and goes through it one item at a time until it's empty. The data structure was inspired by OSv, which uses it in the very code we'll "port" to QEMU for the thread-safe CoMutex. Most of the new code is really tests. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-3-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21block: move AioContext, QEMUTimer, main-loop to libqemuutilPaolo Bonzini
AioContext is fairly self contained, the only dependency is QEMUTimer but that in turn doesn't need anything else. So move them out of block-obj-y to avoid introducing a dependency from io/ to block-obj-y. main-loop and its dependency iohandler also need to be moved, because later in this series io/ will call iohandler_get_aio_context. [Changed copyright "the QEMU team" to "other QEMU contributors" as suggested by Daniel Berrange and agreed by Paolo. --Stefan] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-2-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-13migration: consolidate VMStateField.startHalil Pasic
The member VMStateField.start is used for two things, partial data migration for VBUFFER data (basically provide migration for a sub-buffer) and for locating next in QTAILQ. The implementation of the VBUFFER feature is broken when VMSTATE_ALLOC is used. This however goes unnoticed because actually partial migration for VBUFFER is not used at all. Let's consolidate the usage of VMStateField.start by removing support for partial migration for VBUFFER. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Message-Id: <20170203175217.45562-1-pasic@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-02-09util: add iterators for QemuOpts valuesDaniel P. Berrange
To iterate over all QemuOpts currently requires using a callback function which is inconvenient for control flow. Add support for using iterator functions more directly QemuOptsIter iter; QemuOpt *opt; qemu_opts_iter_init(&iter, opts, "repeated-key"); while ((opt = qemu_opts_iter_next(&iter)) != NULL) { ....do something... } Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170203120649.15637-8-berrange@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-01-31host-utils: Implement unsigned quadword left/right shift and unit testsJose Ricardo Ziviani
Implements 128-bit left shift and right shift as well as their testcases. By design, shift silently mods by 128, so the caller is responsible to assert the shift range if necessary. Left shift sets the overflow flag if any non-zero digit is shifted out. Examples: ulshift(&low, &high, 250, &overflow); equivalent: n << 122 urshift(&low, &high, -2); equivalent: n << 126 Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com> [dwg: Added test-shift128 to .gitignore] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-01-31host-utils: Move 128-bit guard macro to .c fileJose Ricardo Ziviani
It is not possible to implement functions in host-utils.c for architectures with quadwords because the guard is implemented in the Makefile. This patch move the guard out of the Makefile to the implementation file. Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-01-26hbitmap: Add hbitmap_is_serializable()Max Reitz
Bitmaps with a granularity of 58 or above can be neither serialized nor deserialized (see the comment in the function added in this series for an explanation). This patch adds a function so that we can check whether a bitmap actually can be (de-)serialized at all, thus avoiding failing the necessary assertion in hbitmap_serialization_granularity(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20161115225746.3590-2-mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2017-01-24win32: use glib gpoll if glib >= 2.50Marc-André Lureau
A fix has been committed in upstream glib commit 210a9796f78eb90f76f1bd6a304e9fea05e97617. (See also related bug https://bugzilla.gnome.org/show_bug.cgi?id=764415) It is desirable to use the glib version instead of qemu copy, since it provides more debugging facilities (G_MAIN_POLL_DEBUG etc), and hopefully has a better maintainance. Hopefully, we can drop the qemu copy in a few years. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>