aboutsummaryrefslogtreecommitdiff
path: root/qobject/json-streamer.c
AgeCommit message (Collapse)Author
2018-08-24json: Clean up headersMarkus Armbruster
The JSON parser has three public headers, json-lexer.h, json-parser.h, json-streamer.h. They all contain stuff that is of no interest outside qobject/json-*.c. Collect the public interface in include/qapi/qmp/json-parser.h, and everything else in qobject/json-parser-int.h. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-54-armbru@redhat.com>
2018-08-24qobject: Drop superfluous includes of qemu-common.hMarkus Armbruster
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-53-armbru@redhat.com>
2018-08-24json: Make JSONToken opaque outside json-parser.cMarkus Armbruster
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-52-armbru@redhat.com>
2018-08-24json: Unbox tokens queue in JSONMessageParserMarkus Armbruster
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-51-armbru@redhat.com>
2018-08-24json: Streamline json_message_process_token()Markus Armbruster
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-50-armbru@redhat.com>
2018-08-24json: Enforce token count and size limits more tightlyMarkus Armbruster
Token count and size limits exist to guard against excessive heap usage. We check them only after we created the token on the heap. That's assigning a cowboy to the barn to lasso the horse after it has bolted. Close the barn door instead: check before we create the token. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-49-armbru@redhat.com>
2018-08-24json: Fix streamer not to ignore trailing unterminated structuresMarkus Armbruster
json_message_process_token() accumulates tokens until it got the sequence of tokens that comprise a single JSON value (it counts curly braces and square brackets to decide). It feeds those token sequences to json_parser_parse(). If a non-empty sequence of tokens remains at the end of the parse, it's silently ignored. check-qjson.c cases unterminated_array(), unterminated_array_comma(), unterminated_dict(), unterminated_dict_comma() demonstrate this bug. Fix as follows. Introduce a JSON_END_OF_INPUT token. When the streamer receives it, it feeds the accumulated tokens to json_parser_parse(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-46-armbru@redhat.com>
2018-08-24json: Pass lexical errors and limit violations to callbackMarkus Armbruster
The callback to consume JSON values takes QObject *json, Error *err. If both are null, the callback is supposed to make up an error by itself. This sucks. qjson.c's consume_json() neglects to do so, which makes qobject_from_json() null instead of failing. I consider that a bug. The culprit is json_message_process_token(): it passes two null pointers when it runs into a lexical error or a limit violation. Fix it to pass a proper Error object then. Update the callbacks: * monitor.c's handle_qmp_command(): the code to make up an error is now dead, drop it. * qga/main.c's process_event(): lumps the "both null" case together with the "not a JSON object" case. The former is now gone. The error message "Invalid JSON syntax" is misleading for the latter. Improve it to "Input must be a JSON object". * qobject/qjson.c's consume_json(): no update; check-qjson demonstrates qobject_from_json() now sets an error on lexical errors, but still doesn't on some other errors. * tests/libqtest.c's qmp_response(): the Error object is now reliable, so use it to improve the error message. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-40-armbru@redhat.com>
2018-08-24json: Treat unwanted interpolation as lexical errorMarkus Armbruster
The JSON parser optionally supports interpolation. The lexer recognizes interpolation tokens unconditionally. The parser rejects them when interpolation is disabled, in parse_interpolation(). However, it neglects to set an error then, which can make json_parser_parse() fail without setting an error. Move the check for unwanted interpolation from the parser's parse_interpolation() into the lexer's finite state machine. When interpolation is disabled, '%' is now handled like any other unexpected character. The next commit will improve how such lexical errors are handled. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-39-armbru@redhat.com>
2018-08-24json: Don't create JSON_ERROR tokens that won't be usedMarkus Armbruster
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-37-armbru@redhat.com>
2018-08-24json: Don't pass null @tokens to json_parser_parse()Markus Armbruster
json_parser_parse() normally returns the QObject on success. Except it returns null when its @tokens argument is null. Its only caller json_message_process_token() passes null @tokens when emitting a lexical error. The call is a rather opaque way to say json = NULL then. Simplify matters by lifting the assignment to json out of the emit path: initialize json to null, set it to the value of json_parser_parse() when there's no lexical error. Drop the special case from json_parser_parse(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-36-armbru@redhat.com>
2018-08-24json: Redesign the callback to consume JSON valuesMarkus Armbruster
The classical way to structure parser and lexer is to have the client call the parser to get an abstract syntax tree, the parser call the lexer to get the next token, and the lexer call some function to get input characters. Another way to structure them would be to have the client feed characters to the lexer, the lexer feed tokens to the parser, and the parser feed abstract syntax trees to some callback provided by the client. This way is more easily integrated into an event loop that dispatches input characters as they arrive. Our JSON parser is kind of between the two. The lexer feeds tokens to a "streamer" instead of a real parser. The streamer accumulates tokens until it got the sequence of tokens that comprise a single JSON value (it counts curly braces and square brackets to decide). It feeds those token sequences to a callback provided by the client. The callback passes each token sequence to the parser, and gets back an abstract syntax tree. I figure it was done that way to make a straightforward recursive descent parser possible. "Get next token" becomes "pop the first token off the token sequence". Drawback: we need to store a complete token sequence. Each token eats 13 + input characters + malloc overhead bytes. Observations: 1. This is not the only way to use recursive descent. If we replaced "get next token" by a coroutine yield, we could do without a streamer. 2. The lexer reports errors by passing a JSON_ERROR token to the streamer. This communicates the offending input characters and their location, but no more. 3. The streamer reports errors by passing a null token sequence to the callback. The (already poor) lexical error information is thrown away. 4. Having the callback receive a token sequence duplicates the code to convert token sequence to abstract syntax tree in every callback. 5. Known bug: the streamer silently drops incomplete token sequences. This commit rectifies 4. by lifting the call of the parser from the callbacks into the streamer. Later commits will address 3. and 5. The lifting removes a bug from qjson.c's parse_json(): it passed a pointer to a non-null Error * in certain cases, as demonstrated by check-qjson.c. json_parser_parse() is now unused. It's a stupid wrapper around json_parser_parse_err(). Drop it, and rename json_parser_parse_err() to json_parser_parse(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-35-armbru@redhat.com>
2018-08-24json: Have lexer call streamer directlyMarkus Armbruster
json_lexer_init() takes the function to process a token as an argument. It's always json_message_process_token(). Makes the code harder to understand for no actual gain. Drop the indirection. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180823164025.12553-34-armbru@redhat.com>
2018-08-24json: remove useless return value from lexer/parserMarc-André Lureau
The lexer always returns 0 when char feeding. Furthermore, none of the caller care about the return value. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180326150916.9602-10-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180823164025.12553-32-armbru@redhat.com>
2016-07-12json-streamer: fix double-free on exiting during a parsePaolo Bonzini
Now that json-streamer tries not to leak tokens on incomplete parse, the tokens can be freed twice if QEMU destroys the json-streamer object during the parser->emit call. To fix this, create the new empty GQueue earlier, so that it is already in place when the old one is passed to parser->emit. Reported-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1467636059-12557-1-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-30json-streamer: Don't leak tokens on incomplete parseEric Blake
Valgrind complained about a number of leaks in tests/check-qobject-json: ==12657== definitely lost: 17,247 bytes in 1,234 blocks All of which had the same root cause: on an incomplete parse, we were abandoning the token queue without cleaning up the allocated data within each queue element. Introduced in commit 95385fe, when we switched from QList (which recursively frees contents) to g_queue (which does not). We don't yet require glib 2.32 with its g_queue_free_full(), so open-code it instead. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1463608012-12760-1-git-send-email-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-04qobject: Clean up includesPeter Maydell
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1454089805-5470-12-git-send-email-peter.maydell@linaro.org
2015-11-26qjson: Limit number of tokens in addition to total sizeMarkus Armbruster
Commit 29c75dd "json-streamer: limit the maximum recursion depth and maximum token count" attempts to guard against excessive heap usage by limiting total token size (it says "token count", but that's a lie). Total token size is a rather imprecise predictor of heap usage: many small tokens use more space than few large tokens with the same input size, because there's a constant per-token overhead: 37 bytes on my system. Tighten this up: limit the token count to 2Mi. Chosen to roughly match the 64MiB total token size limit. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1448486613-17634-13-git-send-email-armbru@redhat.com>
2015-11-26qjson: surprise, allocating 6 QObjects per token is expensivePaolo Bonzini
Replace the contents of the tokens GQueue with a simple struct. This cuts the amount of memory allocated by tests/check-qjson from ~500MB to ~20MB, and the execution time from 600ms to 80ms on my laptop. Still a lot (some could be saved by using an intrusive list, such as QSIMPLEQ, instead of the GQueue), but the savings are already massive and the right thing to do would probably be to get rid of json-streamer completely. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1448300659-23559-5-git-send-email-pbonzini@redhat.com> [Straightforwardly rebased on my patches] Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-11-26qjson: store tokens in a GQueuePaolo Bonzini
Even though we still have the "streamer" concept, the tokens can now be deleted as they are read. While doing so convert from QList to GQueue, since the next step will make tokens not a QObject and we will have to do the conversion anyway. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1448300659-23559-4-git-send-email-pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-11-26qjson: replace QString in JSONLexer with GStringPaolo Bonzini
JSONLexer only needs a simple resizable buffer. json-streamer.c can allocate memory for each token instead of relying on reference counting of QStrings. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1448300659-23559-2-git-send-email-pbonzini@redhat.com> [Straightforwardly rebased on my patches, checkpatch made happy] Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-11-26qjson: Give each of the six structural chars its own token typeMarkus Armbruster
Simplifies things, because we always check for a specific one. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1448486613-17634-6-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-11-26qjson: Don't crash when input exceeds nesting limitMarkus Armbruster
We limit nesting depth and input size to defend against input triggering excessive heap or stack memory use (commit 29c75dd json-streamer: limit the maximum recursion depth and maximum token count). However, when the nesting limit is exceeded, parser_context_peek_token()'s assertion fails. Broken in commit 65c0f1e "json-parser: don't replicate tokens at each level of recursion". To reproduce stuff 1025 open braces or brackets into QMP. Fix by taking the error exit instead of the normal one. Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1448486613-17634-3-git-send-email-armbru@redhat.com>
2015-11-26qjson: Apply nesting limit more sanelyMarkus Armbruster
The nesting limit from commit 29c75dd "json-streamer: limit the maximum recursion depth and maximum token count" applies separately to braces and brackets. This makes no sense. Apply it to their sum, because that's actually a measure of recursion depth. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1448486613-17634-2-git-send-email-armbru@redhat.com>
2013-01-12build: move qobject files to qobject/ and libqemuutil.aPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>