From abcd92db6a7b725e16826ee2e3fcb13cfe3a96c2 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Fri, 28 Feb 2020 15:35:57 +0000 Subject: qemu-doc: move included files to docs/system MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since qemu-doc.texi is mostly including files from docs/system, move the existing include files there for consistency. Signed-off-by: Paolo Bonzini Reviewed-by: Peter Maydell Reviewed-by: Alex Bennée Tested-by: Alex Bennée Signed-off-by: Peter Maydell Message-id: 20200228153619.9906-12-peter.maydell@linaro.org Message-id: 20200226113034.6741-12-pbonzini@redhat.com [PMM: update MAINTAINERS line for qemu-option-trace.texi] Reviewed-by: Peter Maydell Signed-off-by: Peter Maydell --- docs/security.texi | 167 ---------------- docs/system/deprecated.texi | 377 +++++++++++++++++++++++++++++++++++++ docs/system/qemu-option-trace.texi | 28 +++ docs/system/security.texi | 167 ++++++++++++++++ 4 files changed, 572 insertions(+), 167 deletions(-) delete mode 100644 docs/security.texi create mode 100644 docs/system/deprecated.texi create mode 100644 docs/system/qemu-option-trace.texi create mode 100644 docs/system/security.texi (limited to 'docs') diff --git a/docs/security.texi b/docs/security.texi deleted file mode 100644 index 0d6b30edfc..0000000000 --- a/docs/security.texi +++ /dev/null @@ -1,167 +0,0 @@ -@node Security -@chapter Security - -@section Overview - -This chapter explains the security requirements that QEMU is designed to meet -and principles for securely deploying QEMU. - -@section Security Requirements - -QEMU supports many different use cases, some of which have stricter security -requirements than others. The community has agreed on the overall security -requirements that users may depend on. These requirements define what is -considered supported from a security perspective. - -@subsection Virtualization Use Case - -The virtualization use case covers cloud and virtual private server (VPS) -hosting, as well as traditional data center and desktop virtualization. These -use cases rely on hardware virtualization extensions to execute guest code -safely on the physical CPU at close-to-native speed. - -The following entities are untrusted, meaning that they may be buggy or -malicious: - -@itemize -@item Guest -@item User-facing interfaces (e.g. VNC, SPICE, WebSocket) -@item Network protocols (e.g. NBD, live migration) -@item User-supplied files (e.g. disk images, kernels, device trees) -@item Passthrough devices (e.g. PCI, USB) -@end itemize - -Bugs affecting these entities are evaluated on whether they can cause damage in -real-world use cases and treated as security bugs if this is the case. - -@subsection Non-virtualization Use Case - -The non-virtualization use case covers emulation using the Tiny Code Generator -(TCG). In principle the TCG and device emulation code used in conjunction with -the non-virtualization use case should meet the same security requirements as -the virtualization use case. However, for historical reasons much of the -non-virtualization use case code was not written with these security -requirements in mind. - -Bugs affecting the non-virtualization use case are not considered security -bugs at this time. Users with non-virtualization use cases must not rely on -QEMU to provide guest isolation or any security guarantees. - -@section Architecture - -This section describes the design principles that ensure the security -requirements are met. - -@subsection Guest Isolation - -Guest isolation is the confinement of guest code to the virtual machine. When -guest code gains control of execution on the host this is called escaping the -virtual machine. Isolation also includes resource limits such as throttling of -CPU, memory, disk, or network. Guests must be unable to exceed their resource -limits. - -QEMU presents an attack surface to the guest in the form of emulated devices. -The guest must not be able to gain control of QEMU. Bugs in emulated devices -could allow malicious guests to gain code execution in QEMU. At this point the -guest has escaped the virtual machine and is able to act in the context of the -QEMU process on the host. - -Guests often interact with other guests and share resources with them. A -malicious guest must not gain control of other guests or access their data. -Disk image files and network traffic must be protected from other guests unless -explicitly shared between them by the user. - -@subsection Principle of Least Privilege - -The principle of least privilege states that each component only has access to -the privileges necessary for its function. In the case of QEMU this means that -each process only has access to resources belonging to the guest. - -The QEMU process should not have access to any resources that are inaccessible -to the guest. This way the guest does not gain anything by escaping into the -QEMU process since it already has access to those same resources from within -the guest. - -Following the principle of least privilege immediately fulfills guest isolation -requirements. For example, guest A only has access to its own disk image file -@code{a.img} and not guest B's disk image file @code{b.img}. - -In reality certain resources are inaccessible to the guest but must be -available to QEMU to perform its function. For example, host system calls are -necessary for QEMU but are not exposed to guests. A guest that escapes into -the QEMU process can then begin invoking host system calls. - -New features must be designed to follow the principle of least privilege. -Should this not be possible for technical reasons, the security risk must be -clearly documented so users are aware of the trade-off of enabling the feature. - -@subsection Isolation mechanisms - -Several isolation mechanisms are available to realize this architecture of -guest isolation and the principle of least privilege. With the exception of -Linux seccomp, these mechanisms are all deployed by management tools that -launch QEMU, such as libvirt. They are also platform-specific so they are only -described briefly for Linux here. - -The fundamental isolation mechanism is that QEMU processes must run as -unprivileged users. Sometimes it seems more convenient to launch QEMU as -root to give it access to host devices (e.g. @code{/dev/net/tun}) but this poses a -huge security risk. File descriptor passing can be used to give an otherwise -unprivileged QEMU process access to host devices without running QEMU as root. -It is also possible to launch QEMU as a non-root user and configure UNIX groups -for access to @code{/dev/kvm}, @code{/dev/net/tun}, and other device nodes. -Some Linux distros already ship with UNIX groups for these devices by default. - -@itemize -@item SELinux and AppArmor make it possible to confine processes beyond the -traditional UNIX process and file permissions model. They restrict the QEMU -process from accessing processes and files on the host system that are not -needed by QEMU. - -@item Resource limits and cgroup controllers provide throughput and utilization -limits on key resources such as CPU time, memory, and I/O bandwidth. - -@item Linux namespaces can be used to make process, file system, and other system -resources unavailable to QEMU. A namespaced QEMU process is restricted to only -those resources that were granted to it. - -@item Linux seccomp is available via the QEMU @option{--sandbox} option. It disables -system calls that are not needed by QEMU, thereby reducing the host kernel -attack surface. -@end itemize - -@section Sensitive configurations - -There are aspects of QEMU that can have security implications which users & -management applications must be aware of. - -@subsection Monitor console (QMP and HMP) - -The monitor console (whether used with QMP or HMP) provides an interface -to dynamically control many aspects of QEMU's runtime operation. Many of the -commands exposed will instruct QEMU to access content on the host file system -and/or trigger spawning of external processes. - -For example, the @code{migrate} command allows for the spawning of arbitrary -processes for the purpose of tunnelling the migration data stream. The -@code{blockdev-add} command instructs QEMU to open arbitrary files, exposing -their content to the guest as a virtual disk. - -Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor, -or Linux namespaces, the monitor console should be considered to have privileges -equivalent to those of the user account QEMU is running under. - -It is further important to consider the security of the character device backend -over which the monitor console is exposed. It needs to have protection against -malicious third parties which might try to make unauthorized connections, or -perform man-in-the-middle attacks. Many of the character device backends do not -satisfy this requirement and so must not be used for the monitor console. - -The general recommendation is that the monitor console should be exposed over -a UNIX domain socket backend to the local host only. Use of the TCP based -character device backend is inappropriate unless configured to use both TLS -encryption and authorization control policy on client connections. - -In summary, the monitor console is considered a privileged control interface to -QEMU and as such should only be made accessible to a trusted management -application or user. diff --git a/docs/system/deprecated.texi b/docs/system/deprecated.texi new file mode 100644 index 0000000000..66eca3a1de --- /dev/null +++ b/docs/system/deprecated.texi @@ -0,0 +1,377 @@ +@node Deprecated features +@appendix Deprecated features + +In general features are intended to be supported indefinitely once +introduced into QEMU. In the event that a feature needs to be removed, +it will be listed in this appendix. The feature will remain functional +for 2 releases prior to actual removal. Deprecated features may also +generate warnings on the console when QEMU starts up, or if activated +via a monitor command, however, this is not a mandatory requirement. + +Prior to the 2.10.0 release there was no official policy on how +long features would be deprecated prior to their removal, nor +any documented list of which features were deprecated. Thus +any features deprecated prior to 2.10.0 will be treated as if +they were first deprecated in the 2.10.0 release. + +What follows is a list of all features currently marked as +deprecated. + +@section System emulator command line arguments + +@subsection -machine enforce-config-section=on|off (since 3.1) + +The @option{enforce-config-section} parameter is replaced by the +@option{-global migration.send-configuration=@var{on|off}} option. + +@subsection -no-kvm (since 1.3.0) + +The ``-no-kvm'' argument is now a synonym for setting ``-accel tcg''. + +@subsection -usbdevice (since 2.10.0) + +The ``-usbdevice DEV'' argument is now a synonym for setting +the ``-device usb-DEV'' argument instead. The deprecated syntax +would automatically enable USB support on the machine type. +If using the new syntax, USB support must be explicitly +enabled via the ``-machine usb=on'' argument. + +@subsection -drive file=json:@{...@{'driver':'file'@}@} (since 3.0) + +The 'file' driver for drives is no longer appropriate for character or host +devices and will only accept regular files (S_IFREG). The correct driver +for these file types is 'host_cdrom' or 'host_device' as appropriate. + +@subsection -net ...,name=@var{name} (since 3.1) + +The @option{name} parameter of the @option{-net} option is a synonym +for the @option{id} parameter, which should now be used instead. + +@subsection -smp (invalid topologies) (since 3.1) + +CPU topology properties should describe whole machine topology including +possible CPUs. + +However, historically it was possible to start QEMU with an incorrect topology +where @math{@var{n} <= @var{sockets} * @var{cores} * @var{threads} < @var{maxcpus}}, +which could lead to an incorrect topology enumeration by the guest. +Support for invalid topologies will be removed, the user must ensure +topologies described with -smp include all possible cpus, i.e. + @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}. + +@subsection -vnc acl (since 4.0.0) + +The @code{acl} option to the @code{-vnc} argument has been replaced +by the @code{tls-authz} and @code{sasl-authz} options. + +@subsection QEMU_AUDIO_ environment variables and -audio-help (since 4.0) + +The ``-audiodev'' argument is now the preferred way to specify audio +backend settings instead of environment variables. To ease migration to +the new format, the ``-audiodev-help'' option can be used to convert +the current values of the environment variables to ``-audiodev'' options. + +@subsection Creating sound card devices and vnc without audiodev= property (since 4.2) + +When not using the deprecated legacy audio config, each sound card +should specify an @code{audiodev=} property. Additionally, when using +vnc, you should specify an @code{audiodev=} propery if you plan to +transmit audio through the VNC protocol. + +@subsection -mon ...,control=readline,pretty=on|off (since 4.1) + +The @code{pretty=on|off} switch has no effect for HMP monitors, but is +silently ignored. Using the switch with HMP monitors will become an +error in the future. + +@subsection -realtime (since 4.1) + +The @code{-realtime mlock=on|off} argument has been replaced by the +@code{-overcommit mem-lock=on|off} argument. + +@subsection -numa node,mem=@var{size} (since 4.1) + +The parameter @option{mem} of @option{-numa node} is used to assign a part of +guest RAM to a NUMA node. But when using it, it's impossible to manage specified +RAM chunk on the host side (like bind it to a host node, setting bind policy, ...), +so guest end-ups with the fake NUMA configuration with suboptiomal performance. +However since 2014 there is an alternative way to assign RAM to a NUMA node +using parameter @option{memdev}, which does the same as @option{mem} and adds +means to actualy manage node RAM on the host side. Use parameter @option{memdev} +with @var{memory-backend-ram} backend as an replacement for parameter @option{mem} +to achieve the same fake NUMA effect or a properly configured +@var{memory-backend-file} backend to actually benefit from NUMA configuration. +In future new machine versions will not accept the option but it will still +work with old machine types. User can check QAPI schema to see if the legacy +option is supported by looking at MachineInfo::numa-mem-supported property. + +@subsection -numa node (without memory specified) (since 4.1) + +Splitting RAM by default between NUMA nodes has the same issues as @option{mem} +parameter described above with the difference that the role of the user plays +QEMU using implicit generic or board specific splitting rule. +Use @option{memdev} with @var{memory-backend-ram} backend or @option{mem} (if +it's supported by used machine type) to define mapping explictly instead. + +@subsection RISC-V -bios (since 4.1) + +QEMU 4.1 introduced support for the -bios option in QEMU for RISC-V for the +RISC-V virt machine and sifive_u machine. + +QEMU 4.1 has no changes to the default behaviour to avoid breakages. This +default will change in a future QEMU release, so please prepare now. All users +of the virt or sifive_u machine must change their command line usage. + +QEMU 4.1 has three options, please migrate to one of these three: + 1. ``-bios none`` - This is the current default behavior if no -bios option + is included. QEMU will not automatically load any firmware. It is up + to the user to load all the images they need. + 2. ``-bios default`` - In a future QEMU release this will become the default + behaviour if no -bios option is specified. This option will load the + default OpenSBI firmware automatically. The firmware is included with + the QEMU release and no user interaction is required. All a user needs + to do is specify the kernel they want to boot with the -kernel option + 3. ``-bios `` - Tells QEMU to load the specified file as the firmwrae. + +@subsection -tb-size option (since 5.0) + +QEMU 5.0 introduced an alternative syntax to specify the size of the translation +block cache, @option{-accel tcg,tb-size=}. The new syntax deprecates the +previously available @option{-tb-size} option. + +@subsection -show-cursor option (since 5.0) + +Use @option{-display sdl,show-cursor=on} or + @option{-display gtk,show-cursor=on} instead. + +@section QEMU Machine Protocol (QMP) commands + +@subsection change (since 2.5.0) + +Use ``blockdev-change-medium'' or ``change-vnc-password'' instead. + +@subsection migrate_set_downtime and migrate_set_speed (since 2.8.0) + +Use ``migrate-set-parameters'' instead. + +@subsection migrate-set-cache-size and query-migrate-cache-size (since 2.11.0) + +Use ``migrate-set-parameters'' and ``query-migrate-parameters'' instead. + +@subsection query-block result field dirty-bitmaps[i].status (since 4.0) + +The ``status'' field of the ``BlockDirtyInfo'' structure, returned by +the query-block command is deprecated. Two new boolean fields, +``recording'' and ``busy'' effectively replace it. + +@subsection query-block result field dirty-bitmaps (Since 4.2) + +The ``dirty-bitmaps`` field of the ``BlockInfo`` structure, returned by +the query-block command is itself now deprecated. The ``dirty-bitmaps`` +field of the ``BlockDeviceInfo`` struct should be used instead, which is the +type of the ``inserted`` field in query-block replies, as well as the +type of array items in query-named-block-nodes. + +Since the ``dirty-bitmaps`` field is optionally present in both the old and +new locations, clients must use introspection to learn where to anticipate +the field if/when it does appear in command output. + +@subsection query-cpus (since 2.12.0) + +The ``query-cpus'' command is replaced by the ``query-cpus-fast'' command. + +@subsection query-cpus-fast "arch" output member (since 3.0.0) + +The ``arch'' output member of the ``query-cpus-fast'' command is +replaced by the ``target'' output member. + +@subsection cpu-add (since 4.0) + +Use ``device_add'' for hotplugging vCPUs instead of ``cpu-add''. See +documentation of ``query-hotpluggable-cpus'' for additional +details. + +@subsection query-events (since 4.0) + +The ``query-events'' command has been superseded by the more powerful +and accurate ``query-qmp-schema'' command. + +@subsection chardev client socket with 'wait' option (since 4.0) + +Character devices creating sockets in client mode should not specify +the 'wait' field, which is only applicable to sockets in server mode + +@section Human Monitor Protocol (HMP) commands + +@subsection The hub_id parameter of 'hostfwd_add' / 'hostfwd_remove' (since 3.1) + +The @option{[hub_id name]} parameter tuple of the 'hostfwd_add' and +'hostfwd_remove' HMP commands has been replaced by @option{netdev_id}. + +@subsection cpu-add (since 4.0) + +Use ``device_add'' for hotplugging vCPUs instead of ``cpu-add''. See +documentation of ``query-hotpluggable-cpus'' for additional details. + +@subsection acl_show, acl_reset, acl_policy, acl_add, acl_remove (since 4.0.0) + +The ``acl_show'', ``acl_reset'', ``acl_policy'', ``acl_add'', and +``acl_remove'' commands are deprecated with no replacement. Authorization +for VNC should be performed using the pluggable QAuthZ objects. + +@section Guest Emulator ISAs + +@subsection RISC-V ISA privledge specification version 1.09.1 (since 4.1) + +The RISC-V ISA privledge specification version 1.09.1 has been deprecated. +QEMU supports both the newer version 1.10.0 and the ratified version 1.11.0, these +should be used instead of the 1.09.1 version. + +@section System emulator CPUS + +@subsection RISC-V ISA CPUs (since 4.1) + +The RISC-V cpus with the ISA version in the CPU name have been depcreated. The +four CPUs are: ``rv32gcsu-v1.9.1``, ``rv32gcsu-v1.10.0``, ``rv64gcsu-v1.9.1`` and +``rv64gcsu-v1.10.0``. Instead the version can be specified via the CPU ``priv_spec`` +option when using the ``rv32`` or ``rv64`` CPUs. + +@subsection RISC-V ISA CPUs (since 4.1) + +The RISC-V no MMU cpus have been depcreated. The two CPUs: ``rv32imacu-nommu`` and +``rv64imacu-nommu`` should no longer be used. Instead the MMU status can be specified +via the CPU ``mmu`` option when using the ``rv32`` or ``rv64`` CPUs. + +@section System emulator devices + +@subsection ide-drive (since 4.2) + +The 'ide-drive' device is deprecated. Users should use 'ide-hd' or +'ide-cd' as appropriate to get an IDE hard disk or CD-ROM as needed. + +@subsection scsi-disk (since 4.2) + +The 'scsi-disk' device is deprecated. Users should use 'scsi-hd' or +'scsi-cd' as appropriate to get a SCSI hard disk or CD-ROM as needed. + +@section System emulator machines + +@subsection mips r4k platform (since 5.0) + +This machine type is very old and unmaintained. Users should use the 'malta' +machine type instead. + +@subsection pc-1.0, pc-1.1, pc-1.2 and pc-1.3 (since 5.0) + +These machine types are very old and likely can not be used for live migration +from old QEMU versions anymore. A newer machine type should be used instead. + +@subsection spike_v1.9.1 and spike_v1.10 (since 4.1) + +The version specific Spike machines have been deprecated in favour of the +generic ``spike`` machine. If you need to specify an older version of the RISC-V +spec you can use the ``-cpu rv64gcsu,priv_spec=v1.9.1`` command line argument. + +@section Device options + +@subsection Emulated device options + +@subsubsection -device virtio-blk,scsi=on|off (since 5.0.0) + +The virtio-blk SCSI passthrough feature is a legacy VIRTIO feature. VIRTIO 1.0 +and later do not support it because the virtio-scsi device was introduced for +full SCSI support. Use virtio-scsi instead when SCSI passthrough is required. + +Note this also applies to ``-device virtio-blk-pci,scsi=on|off'', which is an +alias. + +@subsection Block device options + +@subsubsection "backing": "" (since 2.12.0) + +In order to prevent QEMU from automatically opening an image's backing +chain, use ``"backing": null'' instead. + +@subsubsection rbd keyvalue pair encoded filenames: "" (since 3.1.0) + +Options for ``rbd'' should be specified according to its runtime options, +like other block drivers. Legacy parsing of keyvalue pair encoded +filenames is useful to open images with the old format for backing files; +These image files should be updated to use the current format. + +Example of legacy encoding: + +@code{json:@{"file.driver":"rbd", "file.filename":"rbd:rbd/name"@}} + +The above, converted to the current supported format: + +@code{json:@{"file.driver":"rbd", "file.pool":"rbd", "file.image":"name"@}} + +@section Related binaries + +@subsection qemu-img convert -n -o (since 4.2.0) + +All options specified in @option{-o} are image creation options, so +they have no effect when used with @option{-n} to skip image creation. +Silently ignored options can be confusing, so this combination of +options will be made an error in future versions. + +@section Backwards compatibility + +@subsection Runnability guarantee of CPU models (since 4.1.0) + +Previous versions of QEMU never changed existing CPU models in +ways that introduced additional host software or hardware +requirements to the VM. This allowed management software to +safely change the machine type of an existing VM without +introducing new requirements ("runnability guarantee"). This +prevented CPU models from being updated to include CPU +vulnerability mitigations, leaving guests vulnerable in the +default configuration. + +The CPU model runnability guarantee won't apply anymore to +existing CPU models. Management software that needs runnability +guarantees must resolve the CPU model aliases using te +``alias-of'' field returned by the ``query-cpu-definitions'' QMP +command. + +While those guarantees are kept, the return value of +``query-cpu-definitions'' will have existing CPU model aliases +point to a version that doesn't break runnability guarantees +(specifically, version 1 of those CPU models). In future QEMU +versions, aliases will point to newer CPU model versions +depending on the machine type, so management software must +resolve CPU model aliases before starting a virtual machine. + + +@node Recently removed features +@appendix Recently removed features + +What follows is a record of recently removed, formerly deprecated +features that serves as a record for users who have encountered +trouble after a recent upgrade. + +@section QEMU Machine Protocol (QMP) commands + +@subsection block-dirty-bitmap-add "autoload" parameter (since 4.2.0) + +The "autoload" parameter has been ignored since 2.12.0. All bitmaps +are automatically loaded from qcow2 images. + +@section Related binaries + +@subsection qemu-nbd --partition (removed in 5.0.0) + +The ``qemu-nbd --partition $digit'' code (also spelled @option{-P}) +could only handle MBR partitions, and never correctly handled logical +partitions beyond partition 5. Exporting a partition can still be +done by utilizing the @option{--image-opts} option with a raw blockdev +using the @code{offset} and @code{size} parameters layered on top of +any other existing blockdev. For example, if partition 1 is 100MiB +long starting at 1MiB, the old command: + +@code{qemu-nbd -t -P 1 -f qcow2 file.qcow2} + +can be rewritten as: + +@code{qemu-nbd -t --image-opts driver=raw,offset=1M,size=100M,file.driver=qcow2,file.file.driver=file,file.file.filename=file.qcow2} diff --git a/docs/system/qemu-option-trace.texi b/docs/system/qemu-option-trace.texi new file mode 100644 index 0000000000..162f1528d2 --- /dev/null +++ b/docs/system/qemu-option-trace.texi @@ -0,0 +1,28 @@ +@c The contents of this file must be kept in sync with qemu-option-trace.rst.inc +@c until all the users of the texi file have been converted to rst and +@c the texi file can be removed. + +Specify tracing options. + +@table @option +@item [enable=]@var{pattern} +Immediately enable events matching @var{pattern} +(either event name or a globbing pattern). This option is only +available if QEMU has been compiled with the @var{simple}, @var{log} +or @var{ftrace} tracing backend. To specify multiple events or patterns, +specify the @option{-trace} option multiple times. + +Use @code{-trace help} to print a list of names of trace points. + +@item events=@var{file} +Immediately enable events listed in @var{file}. +The file must contain one event name (as listed in the @file{trace-events-all} +file) per line; globbing patterns are accepted too. This option is only +available if QEMU has been compiled with the @var{simple}, @var{log} or +@var{ftrace} tracing backend. + +@item file=@var{file} +Log output traces to @var{file}. +This option is only available if QEMU has been compiled with +the @var{simple} tracing backend. +@end table diff --git a/docs/system/security.texi b/docs/system/security.texi new file mode 100644 index 0000000000..0d6b30edfc --- /dev/null +++ b/docs/system/security.texi @@ -0,0 +1,167 @@ +@node Security +@chapter Security + +@section Overview + +This chapter explains the security requirements that QEMU is designed to meet +and principles for securely deploying QEMU. + +@section Security Requirements + +QEMU supports many different use cases, some of which have stricter security +requirements than others. The community has agreed on the overall security +requirements that users may depend on. These requirements define what is +considered supported from a security perspective. + +@subsection Virtualization Use Case + +The virtualization use case covers cloud and virtual private server (VPS) +hosting, as well as traditional data center and desktop virtualization. These +use cases rely on hardware virtualization extensions to execute guest code +safely on the physical CPU at close-to-native speed. + +The following entities are untrusted, meaning that they may be buggy or +malicious: + +@itemize +@item Guest +@item User-facing interfaces (e.g. VNC, SPICE, WebSocket) +@item Network protocols (e.g. NBD, live migration) +@item User-supplied files (e.g. disk images, kernels, device trees) +@item Passthrough devices (e.g. PCI, USB) +@end itemize + +Bugs affecting these entities are evaluated on whether they can cause damage in +real-world use cases and treated as security bugs if this is the case. + +@subsection Non-virtualization Use Case + +The non-virtualization use case covers emulation using the Tiny Code Generator +(TCG). In principle the TCG and device emulation code used in conjunction with +the non-virtualization use case should meet the same security requirements as +the virtualization use case. However, for historical reasons much of the +non-virtualization use case code was not written with these security +requirements in mind. + +Bugs affecting the non-virtualization use case are not considered security +bugs at this time. Users with non-virtualization use cases must not rely on +QEMU to provide guest isolation or any security guarantees. + +@section Architecture + +This section describes the design principles that ensure the security +requirements are met. + +@subsection Guest Isolation + +Guest isolation is the confinement of guest code to the virtual machine. When +guest code gains control of execution on the host this is called escaping the +virtual machine. Isolation also includes resource limits such as throttling of +CPU, memory, disk, or network. Guests must be unable to exceed their resource +limits. + +QEMU presents an attack surface to the guest in the form of emulated devices. +The guest must not be able to gain control of QEMU. Bugs in emulated devices +could allow malicious guests to gain code execution in QEMU. At this point the +guest has escaped the virtual machine and is able to act in the context of the +QEMU process on the host. + +Guests often interact with other guests and share resources with them. A +malicious guest must not gain control of other guests or access their data. +Disk image files and network traffic must be protected from other guests unless +explicitly shared between them by the user. + +@subsection Principle of Least Privilege + +The principle of least privilege states that each component only has access to +the privileges necessary for its function. In the case of QEMU this means that +each process only has access to resources belonging to the guest. + +The QEMU process should not have access to any resources that are inaccessible +to the guest. This way the guest does not gain anything by escaping into the +QEMU process since it already has access to those same resources from within +the guest. + +Following the principle of least privilege immediately fulfills guest isolation +requirements. For example, guest A only has access to its own disk image file +@code{a.img} and not guest B's disk image file @code{b.img}. + +In reality certain resources are inaccessible to the guest but must be +available to QEMU to perform its function. For example, host system calls are +necessary for QEMU but are not exposed to guests. A guest that escapes into +the QEMU process can then begin invoking host system calls. + +New features must be designed to follow the principle of least privilege. +Should this not be possible for technical reasons, the security risk must be +clearly documented so users are aware of the trade-off of enabling the feature. + +@subsection Isolation mechanisms + +Several isolation mechanisms are available to realize this architecture of +guest isolation and the principle of least privilege. With the exception of +Linux seccomp, these mechanisms are all deployed by management tools that +launch QEMU, such as libvirt. They are also platform-specific so they are only +described briefly for Linux here. + +The fundamental isolation mechanism is that QEMU processes must run as +unprivileged users. Sometimes it seems more convenient to launch QEMU as +root to give it access to host devices (e.g. @code{/dev/net/tun}) but this poses a +huge security risk. File descriptor passing can be used to give an otherwise +unprivileged QEMU process access to host devices without running QEMU as root. +It is also possible to launch QEMU as a non-root user and configure UNIX groups +for access to @code{/dev/kvm}, @code{/dev/net/tun}, and other device nodes. +Some Linux distros already ship with UNIX groups for these devices by default. + +@itemize +@item SELinux and AppArmor make it possible to confine processes beyond the +traditional UNIX process and file permissions model. They restrict the QEMU +process from accessing processes and files on the host system that are not +needed by QEMU. + +@item Resource limits and cgroup controllers provide throughput and utilization +limits on key resources such as CPU time, memory, and I/O bandwidth. + +@item Linux namespaces can be used to make process, file system, and other system +resources unavailable to QEMU. A namespaced QEMU process is restricted to only +those resources that were granted to it. + +@item Linux seccomp is available via the QEMU @option{--sandbox} option. It disables +system calls that are not needed by QEMU, thereby reducing the host kernel +attack surface. +@end itemize + +@section Sensitive configurations + +There are aspects of QEMU that can have security implications which users & +management applications must be aware of. + +@subsection Monitor console (QMP and HMP) + +The monitor console (whether used with QMP or HMP) provides an interface +to dynamically control many aspects of QEMU's runtime operation. Many of the +commands exposed will instruct QEMU to access content on the host file system +and/or trigger spawning of external processes. + +For example, the @code{migrate} command allows for the spawning of arbitrary +processes for the purpose of tunnelling the migration data stream. The +@code{blockdev-add} command instructs QEMU to open arbitrary files, exposing +their content to the guest as a virtual disk. + +Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor, +or Linux namespaces, the monitor console should be considered to have privileges +equivalent to those of the user account QEMU is running under. + +It is further important to consider the security of the character device backend +over which the monitor console is exposed. It needs to have protection against +malicious third parties which might try to make unauthorized connections, or +perform man-in-the-middle attacks. Many of the character device backends do not +satisfy this requirement and so must not be used for the monitor console. + +The general recommendation is that the monitor console should be exposed over +a UNIX domain socket backend to the local host only. Use of the TCP based +character device backend is inappropriate unless configured to use both TLS +encryption and authorization control policy on client connections. + +In summary, the monitor console is considered a privileged control interface to +QEMU and as such should only be made accessible to a trusted management +application or user. -- cgit v1.2.3