aboutsummaryrefslogtreecommitdiff
path: root/docs
AgeCommit message (Collapse)Author
2015-09-24vhost-user: add VHOST_USER_GET_QUEUE_NUM messageYuanhan Liu
This is for querying how many queues the backend supports if it has mq support(when VHOST_USER_PROTOCOL_F_MQ flag is set from the quried protocol features). vhost_net_get_max_queues() is the interface to export that value, and to tell if the backend supports # of queues user requested, which is done in the following patch. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Marcel Apfelbaum <marcel@redhat.com>
2015-09-24vhost: rename VHOST_RESET_OWNER to VHOST_RESET_DEVICEYuanhan Liu
Quote from Michael: We really should rename VHOST_RESET_OWNER to VHOST_RESET_DEVICE. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Tested-by: Marcel Apfelbaum <marcel@redhat.com>
2015-09-24vhost-user: add protocol feature negotiationMichael S. Tsirkin
Support a separate bitmask for vhost-user protocol features, and messages to get/set protocol features. Invoke them at init. No features are defined yet. [ leverage vhost_user_call for request handling -- Yuanhan Liu ] Signed-off-by: Michael S. Tsirkin <address@hidden> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Tested-by: Marcel Apfelbaum <marcel@redhat.com>
2015-09-23spapr: Support ibm,dynamic-reconfiguration-memoryBharata B Rao
Parse ibm,architecture.vec table obtained from the guest and enable memory node configuration via ibm,dynamic-reconfiguration-memory if guest supports it. This is in preparation to support memory hotplug for sPAPR guests. This changes the way memory node configuration is done. Currently all memory nodes are built upfront. But after this patch, only memory@0 node for RMA is built upfront. Guest kernel boots with just that and rest of the memory nodes (via memory@XXX or ibm,dynamic-reconfiguration-memory) are built when guest does ibm,client-architecture-support call. Note: This patch needs a SLOF enhancement which is already part of SLOF binary in QEMU. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-21qapi-introspect: Hide type namesMarkus Armbruster
To eliminate the temptation for clients to look up types by name (which are not ABI), replace all type names by meaningless strings. Reduces output of query-schema by 13 out of 85KiB. As a debugging aid, provide option -u to suppress the hiding. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1442401589-24189-27-git-send-email-armbru@redhat.com>
2015-09-21qapi: New QMP command query-qmp-schema for QMP introspectionMarkus Armbruster
qapi/introspect.json defines the introspection schema. It's designed for QMP introspection, but should do for similar uses, such as QGA. The introspection schema does not reflect all the rules and restrictions that apply to QAPI schemata. A valid QAPI schema has an introspection value conforming to the introspection schema, but the converse is not true. Introspection lowers away a number of schema details, and makes implicit things explicit: * The built-in types are declared with their JSON type. All integer types are mapped to 'int', because how many bits we use internally is an implementation detail. It could be pressed into external interface service as very approximate range information, but that's a bad idea. If we need range information, we better do it properly. * Implicit type definitions are made explicit, and given auto-generated names: - Array types, named by appending "List" to the name of their element type, like in generated C. - The enumeration types implicitly defined by simple union types, named by appending "Kind" to the name of their simple union type, like in generated C. - Types that don't occur in generated C. Their names start with ':' so they don't clash with the user's names. * All type references are by name. * The struct and union types are generalized into an object type. * Base types are flattened. * Commands take a single argument and return a single result. Dictionary argument or list result is an implicit type definition. The empty object type is used when a command takes no arguments or produces no results. The argument is always of object type, but the introspection schema doesn't reflect that. The 'gen': false directive is omitted as implementation detail. The 'success-response' directive is omitted as well for now, even though it's not an implementation detail, because it's not used by QMP. * Events carry a single data value. Implicit type definition and empty object type use, just like for commands. The value is of object type, but the introspection schema doesn't reflect that. * Types not used by commands or events are omitted. Indirect use counts as use. * Optional members have a default, which can only be null right now Instead of a mandatory "optional" flag, we have an optional default. No default means mandatory, default null means optional without default value. Non-null is available for optional with default (possible future extension). * Clients should *not* look up types by name, because type names are not ABI. Look up the command or event you're interested in, then follow the references. TODO Should we hide the type names to eliminate the temptation? New generator scripts/qapi-introspect.py computes an introspection value for its input, and generates a C variable holding it. It can generate awfully long lines. Marked TODO. A new test-qmp-input-visitor test case feeds its result for both tests/qapi-schema/qapi-schema-test.json and qapi-schema.json to a QmpInputVisitor to verify it actually conforms to the schema. New QMP command query-qmp-schema takes its return value from that variable. Its reply is some 85KiBytes for me right now. If this turns out to be too much, we have a couple of options: * We can use shorter names in the JSON. Not the QMP style. * Optionally return the sub-schema for commands and events given as arguments. Right now qmp_query_schema() sends the string literal computed by qmp-introspect.py. To compute sub-schema at run time, we'd have to duplicate parts of qapi-introspect.py in C. Unattractive. * Let clients cache the output of query-qmp-schema. It changes only on QEMU upgrades, i.e. rarely. Provide a command query-qmp-schema-hash. Clients can have a cache indexed by hash, and re-query the schema only when they don't have it cached. Even simpler: put the hash in the QMP greeting. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-09-21qapi: Pseudo-type '**' is now unused, drop itMarkus Armbruster
'gen': false needs to stay for now, because netdev_add is still using it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1442401589-24189-25-git-send-email-armbru@redhat.com>
2015-09-21qapi-schema: Fix up misleading specification of netdev_addMarkus Armbruster
It doesn't take a 'props' argument, let alone one in the format "NAME=VALUE,..." The bogus arguments specification doesn't matter due to 'gen': false. Clean it up to be incomplete rather than wrong, and document the incompleteness. While there, improve netdev_add usage example in the manual: add a device option to show how it's done. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1442401589-24189-24-git-send-email-armbru@redhat.com>
2015-09-21qapi: Introduce a first class 'any' typeMarkus Armbruster
It's first class, because unlike '**', it actually works, i.e. doesn't require 'gen': false. '**' will go away next. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
2015-09-21qapi: Improve built-in type documentationMarkus Armbruster
Clarify how they map to JSON. Add how they map to C. Fix the reference to StringInputVisitor. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1442401589-24189-20-git-send-email-armbru@redhat.com>
2015-09-21qapi-commands: De-duplicate output marshaling functionsMarkus Armbruster
gen_marshal_output() uses its parameter name only for name of the generated function. Name it after the type being marshaled instead of its caller, and drop duplicates. Saves 7 copies of qmp_marshal_output_int() in qemu-ga, and one copy of qmp_marshal_output_str() in qemu-system-*. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1442401589-24189-19-git-send-email-armbru@redhat.com>
2015-09-21qapi: Rename qmp_marshal_input_FOO() to qmp_marshal_FOO()Markus Armbruster
These functions marshal both input and output. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1442401589-24189-17-git-send-email-armbru@redhat.com>
2015-09-21qapi: Clean up after recent conversions to QAPISchemaVisitorMarkus Armbruster
Generate just 'FOO' instead of 'struct FOO' when possible. Drop helper functions that are now unused. Make pep8 and pylint reasonably happy. Rename generate_FOO() functions to gen_FOO() for consistency. Use more consistent and sensible variable names. Consistently use c_ for mapping keys when their value is a C identifier or type. Simplify gen_enum() and gen_visit_union() Consistently use single quotes for C text string literals. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1442401589-24189-14-git-send-email-armbru@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-09-21qapi: De-duplicate enum code generationMarkus Armbruster
Duplicated in commit 21cd70d. Yes, we can't import qapi-types, but that's no excuse. Move the helpers from qapi-types.py to qapi.py, and replace the duplicates in qapi-event.py. The generated event enumeration type's lookup table becomes const-correct (see commit 2e4450f), and uses explicit indexes instead of relying on order (see commit 912ae9c). Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1442401589-24189-10-git-send-email-armbru@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-09-21qapi-types: Convert to QAPISchemaVisitor, fixing flat unionsMarkus Armbruster
Fixes flat unions to get the base's base members. Test case is from commit 2fc0043, in qapi-schema-test.json: { 'union': 'UserDefFlatUnion', 'base': 'UserDefUnionBase', 'discriminator': 'enum1', 'data': { 'value1' : 'UserDefA', 'value2' : 'UserDefB', 'value3' : 'UserDefB' } } { 'struct': 'UserDefUnionBase', 'base': 'UserDefZero', 'data': { 'string': 'str', 'enum1': 'EnumOne' } } { 'struct': 'UserDefZero', 'data': { 'integer': 'int' } } Patch's effect on UserDefFlatUnion: struct UserDefFlatUnion { /* Members inherited from UserDefUnionBase: */ + int64_t integer; char *string; EnumOne enum1; /* Own members: */ union { /* union tag is @enum1 */ void *data; UserDefA *value1; UserDefB *value2; UserDefB *value3; }; }; Flat union visitors remain broken. They'll be fixed next. Code is generated in a different order now, but that doesn't matter. The two guards QAPI_TYPES_BUILTIN_STRUCT_DECL and QAPI_TYPES_BUILTIN_CLEANUP_DECL are replaced by just QAPI_TYPES_BUILTIN. Two ugly special cases for simple unions now stand out like sore thumbs: 1. The type tag is named 'type' everywhere, except in generated C, where it's 'kind'. 2. QAPISchema lowers simple unions to semantically equivalent flat unions. However, the C generated for a simple unions differs from the C generated for its equivalent flat union, and we therefore need special code to preserve that pointless difference for now. Mark both TODO. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-09-15qapi: allow override of default enum prefix namingDaniel P. Berrange
The camel_to_upper() method applies some heuristics to turn a mixed case type name into an all-uppercase name. This is used for example, to generate enum constant name prefixes. The heuristics don't also generate a satisfactory name though. eg { 'enum': 'QCryptoTLSCredsEndpoint', 'data': ['client', 'server']} Results in Q_CRYPTOTLS_CREDS_ENDPOINT_CLIENT. This has an undesirable _ after the initial Q and is missing an _ between the CRYPTO & TLS strings. Rather than try to add more and more heuristics to try to cope with this, simply allow the QAPI schema to specify the desired enum constant prefix explicitly. eg { 'enum': 'QCryptoTLSCredsEndpoint', 'prefix': 'QCRYPTO_TLS_CREDS_ENDPOINT', 'data': ['client', 'server']} Now gives the QCRYPTO_TLS_CREDS_ENDPOINT_CLIENT name. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-09-11typofixes - v4Veres Lajos
Signed-off-by: Veres Lajos <vlajos@gmail.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-09-11maint: remove / fix many doubled wordsDaniel P. Berrange
Many source files have doubled words (eg "the the", "to to", and so on). Most of these can simply be removed, but a couple were actual mis-spellings (eg "to to" instead of "to do"). There was even one triple word score "to to to" :-) Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-09-04docs: document how to configure the qcow2 L2/refcount cachesAlberto Garcia
QEMU has options to configure the size of the L2 and refcount caches for the qcow2 format. However, choosing the right sizes for a particular disk image is not a straightforward operation since the ratio between the cache size and the allocated disk space is not obvious and depends on the size of the cluster and the refcount entries. This document attempts to give an overview of both caches and how to configure their sizes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 55de928e139b1ba3f3d40fe9c6c88f30b1f36410.1438690126.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-09-04docs/qapi-code-gen.txt: Fix QAPI schema examplesMarkus Armbruster
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-09-04qapi: Generated code cleanupMarkus Armbruster
Clean up white-space, brace placement, and superfluous #ifdef QAPI_TYPES_BUILTIN_CLEANUP_DEF. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-09-04qapi-commands: Drop useless initializationMarkus Armbruster
In generated command handlers, the assignment to retval dominates its only use. Therefore, its initialization is useless. Drop it. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-09-04qapi: Command returning anonymous type doesn't work, outlawMarkus Armbruster
Reproducer: with { 'command': 'user_def_cmd4', 'returns': { 'a': 'int' } } added to qapi-schema-test.json, qapi-commands.py dies when it tries to generate the command handler function Traceback (most recent call last): File "/work/armbru/qemu/scripts/qapi-commands.py", line 359, in <module> ret = generate_command_decl(cmd['command'], arglist, ret_type) + "\n" File "/work/armbru/qemu/scripts/qapi-commands.py", line 29, in generate_command_decl ret_type=c_type(ret_type), name=c_name(name), File "/work/armbru/qemu/scripts/qapi.py", line 927, in c_type assert isinstance(value, str) and value != "" AssertionError because the return type doesn't exist. Simply outlaw this usage, and drop or dumb down test cases accordingly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-09-04qapi: Fix to reject union command and event argumentsMarkus Armbruster
A command's or event's 'data' must be a struct type, given either as a dictionary, or as struct type name. Commit dd883c6 tightened the checking there, but not enough: we still accept 'union'. Fix to reject it. We may want to support union types there, but we'll have to extend qapi-commands.py and qapi-events.py for it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-09-04qapi-event: Clean up how name of enum QAPIEvent is madeMarkus Armbruster
Use c_name() instead of ad hoc code. Doesn't upcase the -p prefix, which is an improvement in my book. Unbreaks prefix containing '.', but other funny characters remain broken. To be fixed next. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-09-04qapi: Clarify docs on including the same file multiple timesMarkus Armbruster
It's idempotent. While there, update examples to current code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-07-22AioContext: optimize clearing the EventNotifierPaolo Bonzini
It is pretty rare for aio_notify to actually set the EventNotifier. It can happen with worker threads such as thread-pool.c's, but otherwise it should never be set thanks to the ctx->notify_me optimization. The previous patch, unfortunately, added an unconditional call to event_notifier_test_and_clear; now add a userspace fast path that avoids the call. Note that it is not possible to do the same with event_notifier_set; it would break, as proved (again) by the included formal model. This patch survived over 3000 reboots on aarch64 KVM. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Message-id: 1437487673-23740-7-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-22AioContext: fix broken placement of event_notifier_test_and_clearPaolo Bonzini
event_notifier_test_and_clear must be called before processing events. Otherwise, an aio_poll could "eat" the notification before the main I/O thread invokes ppoll(). The main I/O thread then never wakes up. This is an example of what could happen: i/o thread vcpu thread worker thread --------------------------------------------------------------------- lock_iothread notify_me = 1 ... unlock_iothread bh->scheduled = 1 event_notifier_set lock_iothread notify_me = 3 ppoll notify_me = 1 aio_dispatch aio_bh_poll thread_pool_completion_bh bh->scheduled = 1 event_notifier_set node->io_read(node->opaque) event_notifier_test_and_clear ppoll *** hang *** "Tracing" with qemu_clock_get_ns shows pretty much the same behavior as in the previous bug, so there are no new tricks here---just stare more at the code until it is apparent. One could also use a formal model, of course. The included one shows this with three processes: notifier corresponds to a QEMU thread pool worker, temporary_waiter to a VCPU thread that invokes aio_poll(), waiter to the main I/O thread. I would be happy to say that the formal model found the bug for me, but actually I wrote it after the fact. This patch is a bit of a big hammer. The next one optimizes it, with help (this time for real rather than a posteriori :)) from another, similar formal model. Reported-by: Richard W. M. Jones <rjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Message-id: 1437487673-23740-6-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-22AioContext: fix broken ctx->dispatching optimizationPaolo Bonzini
This patch rewrites the ctx->dispatching optimization, which was the cause of some mysterious hangs that could be reproduced on aarch64 KVM only. The hangs were indirectly caused by aio_poll() and in particular by flash memory updates's call to blk_write(), which invokes aio_poll(). Fun stuff: they had an extremely short race window, so much that adding all kind of tracing to either the kernel or QEMU made it go away (a single printf made it half as reproducible). On the plus side, the failure mode (a hang until the next keypress) made it very easy to examine the state of the process with a debugger. And there was a very nice reproducer from Laszlo, which failed pretty often (more than half of the time) on any version of QEMU with a non-debug kernel; it also failed fast, while still in the firmware. So, it could have been worse. For some unknown reason they happened only with virtio-scsi, but that's not important. It's more interesting that they disappeared with io=native, making thread-pool.c a likely suspect for where the bug arose. thread-pool.c is also one of the few places which use bottom halves across threads, by the way. I hope that no other similar bugs exist, but just in case :) I am going to describe how the successful debugging went... Since the likely culprit was the ctx->dispatching optimization, which mostly affects bottom halves, the first observation was that there are two qemu_bh_schedule() invocations in the thread pool: the one in the aio worker and the one in thread_pool_completion_bh. The latter always causes the optimization to trigger, the former may or may not. In order to restrict the possibilities, I introduced new functions qemu_bh_schedule_slow() and qemu_bh_schedule_fast(): /* qemu_bh_schedule_slow: */ ctx = bh->ctx; bh->idle = 0; if (atomic_xchg(&bh->scheduled, 1) == 0) { event_notifier_set(&ctx->notifier); } /* qemu_bh_schedule_fast: */ ctx = bh->ctx; bh->idle = 0; assert(ctx->dispatching); atomic_xchg(&bh->scheduled, 1); Notice how the atomic_xchg is still in qemu_bh_schedule_slow(). This was already debated a few months ago, so I assumed it to be correct. In retrospect this was a very good idea, as you'll see later. Changing thread_pool_completion_bh() to qemu_bh_schedule_fast() didn't trigger the assertion (as expected). Changing the worker's invocation to qemu_bh_schedule_slow() didn't hide the bug (another assumption which luckily held). This already limited heavily the amount of interaction between the threads, hinting that the problematic events must have triggered around thread_pool_completion_bh(). As mentioned early, invoking a debugger to examine the state of a hung process was pretty easy; the iothread was always waiting on a poll(..., -1) system call. Infinite timeouts are much rarer on x86, and this could be the reason why the bug was never observed there. With the buggy sequence more or less resolved to an interaction between thread_pool_completion_bh() and poll(..., -1), my "tracing" strategy was to just add a few qemu_clock_get_ns(QEMU_CLOCK_REALTIME) calls, hoping that the ordering of aio_ctx_prepare(), aio_ctx_dispatch, poll() and qemu_bh_schedule_fast() would provide some hint. The output was: (gdb) p last_prepare $3 = 103885451 (gdb) p last_dispatch $4 = 103876492 (gdb) p last_poll $5 = 115909333 (gdb) p last_schedule $6 = 115925212 Notice how the last call to qemu_poll_ns() came after aio_ctx_dispatch(). This makes little sense unless there is an aio_poll() call involved, and indeed with a slightly different instrumentation you can see that there is one: (gdb) p last_prepare $3 = 107569679 (gdb) p last_dispatch $4 = 107561600 (gdb) p last_aio_poll $5 = 110671400 (gdb) p last_schedule $6 = 110698917 So the scenario becomes clearer: iothread VCPU thread -------------------------------------------------------------------------- aio_ctx_prepare aio_ctx_check qemu_poll_ns(timeout=-1) aio_poll aio_dispatch thread_pool_completion_bh qemu_bh_schedule() At this point bh->scheduled = 1 and the iothread has not been woken up. The solution must be close, but this alone should not be a problem, because the bottom half is only rescheduled to account for rare situations (see commit 3c80ca1, thread-pool: avoid deadlock in nested aio_poll() calls, 2014-07-15). Introducing a third thread---a thread pool worker thread, which also does qemu_bh_schedule()---does bring out the problematic case. The third thread must be awakened *after* the callback is complete and thread_pool_completion_bh has redone the whole loop, explaining the short race window. And then this is what happens: thread pool worker -------------------------------------------------------------------------- <I/O completes> qemu_bh_schedule() Tada, bh->scheduled is already 1, so qemu_bh_schedule() does nothing and the iothread is never woken up. This is where the bh->scheduled optimization comes into play---it is correct, but removing it would have masked the bug. So, what is the bug? Well, the question asked by the ctx->dispatching optimization ("is any active aio_poll dispatching?") was wrong. The right question to ask instead is "is any active aio_poll *not* dispatching", i.e. in the prepare or poll phases? In that case, the aio_poll is sleeping or might go to sleep anytime soon, and the EventNotifier must be invoked to wake it up. In any other case (including if there is *no* active aio_poll at all!) we can just wait for the next prepare phase to pick up the event (e.g. a bottom half); the prepare phase will avoid the blocking and service the bottom half. Expressing the invariant with a logic formula, the broken one looked like: !(exists(thread): in_dispatching(thread)) => !optimize or equivalently: !(exists(thread): in_aio_poll(thread) && in_dispatching(thread)) => !optimize In the correct one, the negation is in a slightly different place: (exists(thread): in_aio_poll(thread) && !in_dispatching(thread)) => !optimize or equivalently: (exists(thread): in_prepare_or_poll(thread)) => !optimize Even if the difference boils down to moving an exclamation mark :) the implementation is quite different. However, I think the new one is simpler to understand. In the old implementation, the "exists" was implemented with a boolean value. This didn't really support well the case of multiple concurrent event loops, but I thought that this was okay: aio_poll holds the AioContext lock so there cannot be concurrent aio_poll invocations, and I was just considering nested event loops. However, aio_poll _could_ indeed be concurrent with the GSource. This is why I came up with the wrong invariant. In the new implementation, "exists" is computed simply by counting how many threads are in the prepare or poll phases. There are some interesting points to consider, but the gist of the idea remains: 1) AioContext can be used through GSource as well; as mentioned in the patch, bit 0 of the counter is reserved for the GSource. 2) the counter need not be updated for a non-blocking aio_poll, because it won't sleep forever anyway. This is just a matter of checking the "blocking" variable. This requires some changes to the win32 implementation, but is otherwise not too complicated. 3) as mentioned above, the new implementation will not call aio_notify when there is *no* active aio_poll at all. The tests have to be adjusted for this change. The calls to aio_notify in async.c are fine; they only want to kick aio_poll out of a blocking wait, but need not do anything if aio_poll is not running. 4) nested aio_poll: these just work with the new implementation; when a nested event loop is invoked, the outer event loop is never in the prepare or poll phases. The outer event loop thus has already decremented the counter. Reported-by: Richard W. M. Jones <rjones@redhat.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Message-id: 1437487673-23740-5-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-20Revert "vhost-user: add multi queue support"Michael S. Tsirkin
This reverts commit 830d70db692e374b55555f4407f96a1ceefdcc97. The interface isn't fully backwards-compatible, which is bad. Let's redo this properly after 2.4. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-07-07Merge remote-tracking branch 'remotes/agraf/tags/signed-ppc-for-upstream' ↵Peter Maydell
into staging Patch queue for ppc - 2015-07-07 A few last minute PPC changes for 2.4: - spapr: Update SLOF - spapr: Fix a few bugs - spapr: Preparation for hotplug - spapr: Minor code cleanups - linux-user: Add mftb handling - kvm: Enable hugepage support with memory-backend-file - mac99: Remove nonexistent interrupt pin (Mac OS 9 fix) # gpg: Signature made Tue Jul 7 16:48:41 2015 BST using RSA key ID 03FEDC60 # gpg: Good signature from "Alexander Graf <agraf@suse.de>" # gpg: aka "Alexander Graf <alex@csgraf.de>" * remotes/agraf/tags/signed-ppc-for-upstream: (30 commits) sPAPR: Clear stale MSIx table during EEH reset sPAPR: Reenable EEH functionality on reboot sPAPR: Don't enable EEH on emulated PCI devices spapr-vty: Use TYPE_ definition instead of hardcoding spapr_vty: lookup should only return valid VTY objects spapr_pci: drop redundant args in spapr_[populate, create]_pci_child_dt spapr_pci: populate ibm,loc-code spapr_pci: enumerate and add PCI device tree xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled ppc: Update cpu_model in MachineState spapr: Consolidate cpu init code into a routine spapr: Reorganize CPU dt generation code cpus: Add a macro to walk CPUs in reverse spapr: Support ibm, lrdr-capacity device tree property spapr: Consider max_cpus during xics initialization Revert "hw/ppc/spapr_pci.c: Avoid functions not in glib 2.12 (g_hash_table_iter_*)" spapr_iommu: translate sPAPRTCEAccess to IOMMUAccessFlags spapr_iommu: drop erroneous check in h_put_tce_indirect() spapr_pci: set device node unit address as hex spapr_pci: encode class code including Prog IF register ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-07-07Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20150707' ↵Peter Maydell
into staging migration/next for 20150707 # gpg: Signature made Tue Jul 7 13:56:30 2015 BST using RSA key ID 5872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" * remotes/juanquintela/tags/migration/20150707: (28 commits) migration: extend migration_bitmap migration: protect migration_bitmap check_section_footers: Check the correct section_id migration: Add migration events on target side migration: Make events a capability migration: create migration event migration: No need to call trace_migrate_set_state() migration: Use always helper to set state migration: ensure we start in NONE state migration: Use cmpxchg correctly migration: Add configuration section vmstate: Create optional sections global_state: Make section optional migration: create new section to store global state runstate: migration allows more transitions now runstate: Add runstate store Fix older machine type compatibility on power with section footers Fail more cleanly in mismatched RAM cases Sanity check RDMA remote data Sort destination RAMBlocks to be the same as the source ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-07-07spapr: Support ibm, lrdr-capacity device tree propertyBharata B Rao
Add support for ibm,lrdr-capacity since this is needed by the guest kernel to know about the possible hot-pluggable CPUs and Memory. With this, pseries kernels will start reporting correct maxcpus in /sys/devices/system/cpu/possible. Also define the minimum hotpluggable memory size as 256MB. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [agraf: Fix compile error on 32bit hosts] Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07migration: create migration eventJuan Quintela
We have one argument that tells us what event has happened. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-07-07rocker: mark copy-to-cpu pkts as forwarding offloadedScott Feldman
For pkts copied to the CPU (to be processed by guest driver), mark the Rx descriptor with flag "OFFLOAD_FWD" to indicate device has already forwarded pkt. The guest driver will use this indicator to avoid duplicate forwarding in the guest OS. Examples include bcast/mcast/unknown ucast pkts flooded to bridged ports. We want to avoid both the device and the guest bridge driver flooding these pkts, which would result in duplicates pkts on the wire. Packet sampling, such as sFlow, can also use this technique to mark pkts for the guest OS to record but otherwise drop. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Message-id: 1435746792-41278-5-git-send-email-sfeldma@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-07-03update pci-bridge-seat section in docs/multiseat.txtGerd Hoffmann
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-07-03virtio-input: add input routing supportGerd Hoffmann
Add display and head properties for input routing to virtio-input devices, update multiseat documentation. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-07-02qapi: Rename 'dirty-bitmap' mode to 'incremental'John Snow
If we wish to make differential backups a feature that's easy to access, it might be pertinent to rename the "dirty-bitmap" mode to "incremental" to make it clear what /type/ of backup the dirty-bitmap is helping us perform. This is an API breaking change, but 2.4 has not yet gone live, so we have this flexibility. Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 1433463642-21840-2-git-send-email-jsnow@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-06-23add pci-bridge-seatGerd Hoffmann
Simplifies multiseat configuration, see docs/multiseat.txt update for details. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-06-18qapi: Drop bogus command from docsMarkus Armbruster
Commit 87a560c4 added it in the wrong place. Commit 59a2c4ce added it in the right place, but didn't remove it from the wrong place. Do that now. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-06-12Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into ↵Peter Maydell
staging # gpg: Signature made Fri Jun 12 13:57:20 2015 BST using RSA key ID 81AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" * remotes/stefanha/tags/net-pull-request: qmp/hmp: add rocker device support rocker: bring link up/down on PHY enable/disable rocker: update tests using hw-derived interface names rocker: Add support for phys name iohandler: Change return type of qemu_set_fd_handler to "void" event-notifier: Always return 0 for posix implementation xen_backend: Remove unused error handling of qemu_set_fd_handler oss: Remove unused error handling of qemu_set_fd_handler alsaaudio: Remove unused error handling of qemu_set_fd_handler main-loop: Drop qemu_set_fd_handler2 Change qemu_set_fd_handler2(..., NULL, ...) to qemu_set_fd_handler tap: Drop tap_can_send net/socket: Drop net_socket_can_send netmap: Drop netmap_can_send l2tpv3: Drop l2tpv3_can_send stubs: Add qemu_set_fd_handler Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-06-12rocker: Add support for phys nameDavid Ahern
Add ROCKER_TLV_CMD_PORT_SETTINGS_PHYS_NAME to port settings. This attribute exports the port name to the guest OS allowing it to name interfaces with sensible defaults. Mostly done by Scott for phys_id support; adapted to phys_name by David. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Message-id: 1433985681-56138-2-git-send-email-sfeldma@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-06-12migration: Use normal VMStateDescriptions for SubsectionsJuan Quintela
We create optional sections with this patch. But we already have optional subsections. Instead of having two mechanism that do the same, we can just generalize it. For subsections we just change: - Add a needed function to VMStateDescription - Remove VMStateSubsection (after removal of the needed function it is just a VMStateDescription) - Adjust the whole tree, moving the needed function to the corresponding VMStateDescription Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-06-10fw_cfg: insert fw_cfg file blobs via qemu cmdlineGabriel L. Somlo
Allow user supplied files to be inserted into the fw_cfg device before starting the guest. Since fw_cfg_add_file() already disallows duplicate fw_cfg file names, qemu will exit with an error message if the user supplies multiple blobs with the same fw_cfg file name, or if a blob name collides with a fw_cfg name programmatically added from within the QEMU source code. A warning message will be printed if the fw_cfg item name does not begin with the prefix "opt/", which is recommended for external, user provided blobs. Signed-off-by: Gabriel Somlo <somlo@cmu.edu> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-06-04Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
pc, acpi, virtio, tpm This includes pxb support by Marcel, as well as multiple enhancements all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Thu Jun 4 11:51:02 2015 BST using RSA key ID D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" * remotes/mst/tags/for_upstream: (28 commits) vhost: logs sharing hw/acpi: piix4_pm_init(): take fw_cfg object no more hw/acpi: move "etc/system-states" fw_cfg file from PIIX4 to core hw/acpi: acpi_pm1_cnt_init(): take "disable_s3" and "disable_s4" pc-dimm: don't assert if pc-dimm alignment != hotpluggable mem range size docs: Add PXB documentation apci: fix PXB behaviour if used with unsupported BIOS hw/pxb: add numa_node parameter hw/pci: add support for NUMA nodes hw/pxb: add map_irq func hw/pci: inform bios if the system has extra pci root buses hw/pci: introduce PCI Expander Bridge (PXB) hw/pci: removed 'rootbus nr is 0' assumption from qmp_pci_query hw/acpi: remove from root bus 0 the crs resources used by other buses. hw/acpi: add _CRS method for extra root busses hw/apci: add _PRT method for extra PCI root busses hw/acpi: add support for i440fx 'snooping' root busses hw/pci: extend PCI config access to support devices behind PXB hw/i386: query only for q35/pc when looking for pci host bridge hw/pci: made pci_bus_num a PCIBusClass method ... Conflicts: hw/i386/pc_piix.c Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-06-04Merge remote-tracking branch 'remotes/agraf/tags/signed-ppc-for-upstream' ↵Peter Maydell
into staging Patch queue for ppc - 2015-06-03 Highlights this time around: - sPAPR: endian fixes, speedups, bug fixes, hotplug basics - add default ram size capability for machines (sPAPR defaults to 512MB now) # gpg: Signature made Wed Jun 3 22:59:09 2015 BST using RSA key ID 03FEDC60 # gpg: Good signature from "Alexander Graf <agraf@suse.de>" # gpg: aka "Alexander Graf <alex@csgraf.de>" * remotes/agraf/tags/signed-ppc-for-upstream: (40 commits) softmmu: support up to 12 MMU modes tcg: add TCG_TARGET_TLB_DISPLACEMENT_BITS tci: do not use CPUArchState in tcg-target.h Add David Gibson for sPAPR in MAINTAINERS file pseries: Enable in-kernel H_LOGICAL_CI_{LOAD, STORE} implementations spapr: override default ram size to 512MB machine: add default_ram_size to machine class spapr_pci: emit hotplug add/remove events during hotplug spapr_pci: enable basic hotplug operations pci: make pci_bar useable outside pci.c spapr_pci: create DRConnectors for each PCI slot during PHB realize spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge spapr_drc: add spapr_drc_populate_dt() spapr_events: event-scan RTAS interface spapr_events: re-use EPOW event infrastructure for hotplug events spapr_rtas: add ibm, configure-connector RTAS interface spapr: add rtas_st_buffer_direct() helper spapr_rtas: add get-sensor-state RTAS interface spapr_rtas: add set-indicator RTAS interface spapr_rtas: add get/set-power-level RTAS interfaces ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-06-03docs: add sPAPR hotplug/dynamic-reconfiguration documentationMichael Roth
This adds a general overview of hotplug/dynamic-reconfiguration for sPAPR/pSeries guest. As specified in PAPR+ v2.7. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03docs: Add PXB documentationMarcel Apfelbaum
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Laszlo Ersek <lersek@redhat.com>
2015-06-03docs/writing-qmp-commands: fix a typoChen Hanxiao
s/interation/iteration Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-06-01vhost-user: add multi queue supportOuyang Changchun
Based on patch by Nikolay Nikolaev: Vhost-user will implement the multi queue support in a similar way to what vhost already has - a separate thread for each queue. To enable the multi queue functionality - a new command line parameter "queues" is introduced for the vhost-user netdev. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by: Changchun Ouyang <changchun.ouyang@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>