diff options
author | Markus Armbruster <armbru@redhat.com> | 2013-04-11 18:07:21 +0200 |
---|---|---|
committer | Blue Swirl <blauwirbel@gmail.com> | 2013-04-13 19:40:25 +0000 |
commit | e2ec3f976803b360c70d9ae2ba13852fa5d11665 (patch) | |
tree | 55b35131f8eceadc89e793ccba46c542194742a8 /hw/lm32/milkymist.c | |
parent | 1d50c8e947180174acb02bad9ff95e0aee6249ea (diff) |
qjson: to_json() case QTYPE_QSTRING is buggy, rewrite
Known bugs in to_json():
* A start byte for a three-byte sequence followed by less than two
continuation bytes is split into one-byte sequences.
* Start bytes for sequences longer than three bytes get misinterpreted
as start bytes for three-byte sequences. Continuation bytes beyond
byte three become one-byte sequences.
This means all characters outside the BMP are decoded incorrectly.
* One-byte sequences with the MSB are put into the JSON string
verbatim when char is unsigned, producing invalid UTF-8. When char
is signed, they're replaced by "\\uFFFF" instead.
This includes \xFE, \xFF, and stray continuation bytes.
* Overlong sequences are happily accepted, unless screwed up by the
bugs above.
* Likewise, sequences encoding surrogate code points or noncharacters.
* Unlike other control characters, ASCII DEL is not escaped. Except
in overlong encodings.
My rewrite fixes them as follows:
* Malformed UTF-8 sequences are replaced.
Except the overlong encoding \xC0\x80 of U+0000 is still accepted.
Permits embedding NUL characters in C strings. This trick is known
as "Modified UTF-8".
* Sequences encoding code points beyond Unicode range are replaced.
* Sequences encoding code points beyond the BMP produce a surrogate
pair.
* Sequences encoding surrogate code points are replaced.
* Sequences encoding noncharacters are replaced.
* ASCII DEL is now always escaped.
The replacement character is U+FFFD.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Diffstat (limited to 'hw/lm32/milkymist.c')
0 files changed, 0 insertions, 0 deletions