aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/qapi-code-gen.txt15
-rw-r--r--docs/specs/qcow2.txt221
-rw-r--r--docs/throttle.txt252
3 files changed, 480 insertions, 8 deletions
diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt
index 128f074a2d..999f3b98f0 100644
--- a/docs/qapi-code-gen.txt
+++ b/docs/qapi-code-gen.txt
@@ -187,11 +187,11 @@ prevent incomplete include files.
Usage: { 'struct': STRING, 'data': DICT, '*base': STRUCT-NAME }
-A struct is a dictionary containing a single 'data' key whose
-value is a dictionary. This corresponds to a struct in C or an Object
-in JSON. Each value of the 'data' dictionary must be the name of a
-type, or a one-element array containing a type name. An example of a
-struct is:
+A struct is a dictionary containing a single 'data' key whose value is
+a dictionary; the dictionary may be empty. This corresponds to a
+struct in C or an Object in JSON. Each value of the 'data' dictionary
+must be the name of a type, or a one-element array containing a type
+name. An example of a struct is:
{ 'struct': 'MyType',
'data': { 'member1': 'str', 'member2': 'int', '*member3': 'str' } }
@@ -288,9 +288,10 @@ or: { 'union': STRING, 'data': DICT, 'base': STRUCT-NAME,
Union types are used to let the user choose between several different
variants for an object. There are two flavors: simple (no
-discriminator or base), flat (both discriminator and base). A union
+discriminator or base), and flat (both discriminator and base). A union
type is defined using a data dictionary as explained in the following
-paragraphs.
+paragraphs. The data dictionary for either type of union must not
+be empty.
A simple union type defines a mapping from automatic discriminator
values to data types like in this example:
diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index f236d8c6d9..80cdfd0e91 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -103,7 +103,18 @@ in the description of a field.
write to an image with unknown auto-clear features if it
clears the respective bits from this field first.
- Bits 0-63: Reserved (set to 0)
+ Bit 0: Bitmaps extension bit
+ This bit indicates consistency for the bitmaps
+ extension data.
+
+ It is an error if this bit is set without the
+ bitmaps extension present.
+
+ If the bitmaps extension is present but this
+ bit is unset, the bitmaps extension data must be
+ considered inconsistent.
+
+ Bits 1-63: Reserved (set to 0)
96 - 99: refcount_order
Describes the width of a reference count block entry (width
@@ -123,6 +134,7 @@ be stored. Each extension has a structure like the following:
0x00000000 - End of the header extension area
0xE2792ACA - Backing file format name
0x6803f857 - Feature name table
+ 0x23852875 - Bitmaps extension
other - Unknown header extension, can be safely
ignored
@@ -166,6 +178,36 @@ the header extension data. Each entry look like this:
terminated if it has full length)
+== Bitmaps extension ==
+
+The bitmaps extension is an optional header extension. It provides the ability
+to store bitmaps related to a virtual disk. For now, there is only one bitmap
+type: the dirty tracking bitmap, which tracks virtual disk changes from some
+point in time.
+
+The data of the extension should be considered consistent only if the
+corresponding auto-clear feature bit is set, see autoclear_features above.
+
+The fields of the bitmaps extension are:
+
+ Byte 0 - 3: nb_bitmaps
+ The number of bitmaps contained in the image. Must be
+ greater than or equal to 1.
+
+ Note: Qemu currently only supports up to 65535 bitmaps per
+ image.
+
+ 4 - 7: Reserved, must be zero.
+
+ 8 - 15: bitmap_directory_size
+ Size of the bitmap directory in bytes. It is the cumulative
+ size of all (nb_bitmaps) bitmap headers.
+
+ 16 - 23: bitmap_directory_offset
+ Offset into the image file at which the bitmap directory
+ starts. Must be aligned to a cluster boundary.
+
+
== Host cluster management ==
qcow2 manages the allocation of host clusters by maintaining a reference count
@@ -360,3 +402,180 @@ Snapshot table entry:
variable: Padding to round up the snapshot table entry size to the
next multiple of 8.
+
+
+== Bitmaps ==
+
+As mentioned above, the bitmaps extension provides the ability to store bitmaps
+related to a virtual disk. This section describes how these bitmaps are stored.
+
+All stored bitmaps are related to the virtual disk stored in the same image, so
+each bitmap size is equal to the virtual disk size.
+
+Each bit of the bitmap is responsible for strictly defined range of the virtual
+disk. For bit number bit_nr the corresponding range (in bytes) will be:
+
+ [bit_nr * bitmap_granularity .. (bit_nr + 1) * bitmap_granularity - 1]
+
+Granularity is a property of the concrete bitmap, see below.
+
+
+=== Bitmap directory ===
+
+Each bitmap saved in the image is described in a bitmap directory entry. The
+bitmap directory is a contiguous area in the image file, whose starting offset
+and length are given by the header extension fields bitmap_directory_offset and
+bitmap_directory_size. The entries of the bitmap directory have variable
+length, depending on the lengths of the bitmap name and extra data. These
+entries are also called bitmap headers.
+
+Structure of a bitmap directory entry:
+
+ Byte 0 - 7: bitmap_table_offset
+ Offset into the image file at which the bitmap table
+ (described below) for the bitmap starts. Must be aligned to
+ a cluster boundary.
+
+ 8 - 11: bitmap_table_size
+ Number of entries in the bitmap table of the bitmap.
+
+ 12 - 15: flags
+ Bit
+ 0: in_use
+ The bitmap was not saved correctly and may be
+ inconsistent.
+
+ 1: auto
+ The bitmap must reflect all changes of the virtual
+ disk by any application that would write to this qcow2
+ file (including writes, snapshot switching, etc.). The
+ type of this bitmap must be 'dirty tracking bitmap'.
+
+ 2: extra_data_compatible
+ This flags is meaningful when the extra data is
+ unknown to the software (currently any extra data is
+ unknown to Qemu).
+ If it is set, the bitmap may be used as expected, extra
+ data must be left as is.
+ If it is not set, the bitmap must not be used, but
+ both it and its extra data be left as is.
+
+ Bits 3 - 31 are reserved and must be 0.
+
+ 16: type
+ This field describes the sort of the bitmap.
+ Values:
+ 1: Dirty tracking bitmap
+
+ Values 0, 2 - 255 are reserved.
+
+ 17: granularity_bits
+ Granularity bits. Valid values: 0 - 63.
+
+ Note: Qemu currently doesn't support granularity_bits
+ greater than 31.
+
+ Granularity is calculated as
+ granularity = 1 << granularity_bits
+
+ A bitmap's granularity is how many bytes of the image
+ accounts for one bit of the bitmap.
+
+ 18 - 19: name_size
+ Size of the bitmap name. Must be non-zero.
+
+ Note: Qemu currently doesn't support values greater than
+ 1023.
+
+ 20 - 23: extra_data_size
+ Size of type-specific extra data.
+
+ For now, as no extra data is defined, extra_data_size is
+ reserved and should be zero. If it is non-zero the
+ behavior is defined by extra_data_compatible flag.
+
+ variable: extra_data
+ Extra data for the bitmap, occupying extra_data_size bytes.
+ Extra data must never contain references to clusters or in
+ some other way allocate additional clusters.
+
+ variable: name
+ The name of the bitmap (not null terminated), occupying
+ name_size bytes. Must be unique among all bitmap names
+ within the bitmaps extension.
+
+ variable: Padding to round up the bitmap directory entry size to the
+ next multiple of 8. All bytes of the padding must be zero.
+
+
+=== Bitmap table ===
+
+Each bitmap is stored using a one-level structure (as opposed to two-level
+structures like for refcounts and guest clusters mapping) for the mapping of
+bitmap data to host clusters. This structure is called the bitmap table.
+
+Each bitmap table has a variable size (stored in the bitmap directory entry)
+and may use multiple clusters, however, it must be contiguous in the image
+file.
+
+Structure of a bitmap table entry:
+
+ Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero.
+ If bits 9 - 55 are zero:
+ 0: Cluster should be read as all zeros.
+ 1: Cluster should be read as all ones.
+
+ 1 - 8: Reserved and must be zero.
+
+ 9 - 55: Bits 9 - 55 of the host cluster offset. Must be aligned to
+ a cluster boundary. If the offset is 0, the cluster is
+ unallocated; in that case, bit 0 determines how this
+ cluster should be treated during reads.
+
+ 56 - 63: Reserved and must be zero.
+
+
+=== Bitmap data ===
+
+As noted above, bitmap data is stored in separate clusters, described by the
+bitmap table. Given an offset (in bytes) into the bitmap data, the offset into
+the image file can be obtained as follows:
+
+ image_offset(bitmap_data_offset) =
+ bitmap_table[bitmap_data_offset / cluster_size] +
+ (bitmap_data_offset % cluster_size)
+
+This offset is not defined if bits 9 - 55 of bitmap table entry are zero (see
+above).
+
+Given an offset byte_nr into the virtual disk and the bitmap's granularity, the
+bit offset into the image file to the corresponding bit of the bitmap can be
+calculated like this:
+
+ bit_offset(byte_nr) =
+ image_offset(byte_nr / granularity / 8) * 8 +
+ (byte_nr / granularity) % 8
+
+If the size of the bitmap data is not a multiple of the cluster size then the
+last cluster of the bitmap data contains some unused tail bits. These bits must
+be zero.
+
+
+=== Dirty tracking bitmaps ===
+
+Bitmaps with 'type' field equal to one are dirty tracking bitmaps.
+
+When the virtual disk is in use dirty tracking bitmap may be 'enabled' or
+'disabled'. While the bitmap is 'enabled', all writes to the virtual disk
+should be reflected in the bitmap. A set bit in the bitmap means that the
+corresponding range of the virtual disk (see above) was written to while the
+bitmap was 'enabled'. An unset bit means that this range was not written to.
+
+The software doesn't have to sync the bitmap in the image file with its
+representation in RAM after each write. Flag 'in_use' should be set while the
+bitmap is not synced.
+
+In the image file the 'enabled' state is reflected by the 'auto' flag. If this
+flag is set, the software must consider the bitmap as 'enabled' and start
+tracking virtual disk changes to this bitmap from the first write to the
+virtual disk. If this flag is not set then the bitmap is disabled.
diff --git a/docs/throttle.txt b/docs/throttle.txt
new file mode 100644
index 0000000000..28204e46ca
--- /dev/null
+++ b/docs/throttle.txt
@@ -0,0 +1,252 @@
+The QEMU throttling infrastructure
+==================================
+Copyright (C) 2016 Igalia, S.L.
+Author: Alberto Garcia <berto@igalia.com>
+
+This work is licensed under the terms of the GNU GPL, version 2 or
+later. See the COPYING file in the top-level directory.
+
+Introduction
+------------
+QEMU includes a throttling module that can be used to set limits to
+I/O operations. The code itself is generic and independent of the I/O
+units, but it is currenly used to limit the number of bytes per second
+and operations per second (IOPS) when performing disk I/O.
+
+This document explains how to use the throttling code in QEMU, and how
+it works internally. The implementation is in throttle.c.
+
+
+Using throttling to limit disk I/O
+----------------------------------
+Two aspects of the disk I/O can be limited: the number of bytes per
+second and the number of operations per second (IOPS). For each one of
+them the user can set a global limit or separate limits for read and
+write operations. This gives us a total of six different parameters.
+
+I/O limits can be set using the throttling.* parameters of -drive, or
+using the QMP 'block_set_io_throttle' command. These are the names of
+the parameters for both cases:
+
+|-----------------------+-----------------------|
+| -drive | block_set_io_throttle |
+|-----------------------+-----------------------|
+| throttling.iops-total | iops |
+| throttling.iops-read | iops_rd |
+| throttling.iops-write | iops_wr |
+| throttling.bps-total | bps |
+| throttling.bps-read | bps_rd |
+| throttling.bps-write | bps_wr |
+|-----------------------+-----------------------|
+
+It is possible to set limits for both IOPS and bps and the same time,
+and for each case we can decide whether to have separate read and
+write limits or not, but note that if iops-total is set then neither
+iops-read nor iops-write can be set. The same applies to bps-total and
+bps-read/write.
+
+The default value of these parameters is 0, and it means 'unlimited'.
+
+In its most basic usage, the user can add a drive to QEMU with a limit
+of 100 IOPS with the following -drive line:
+
+ -drive file=hd0.qcow2,throttling.iops-total=100
+
+We can do the same using QMP. In this case all these parameters are
+mandatory, so we must set to 0 the ones that we don't want to limit:
+
+ { "execute": "block_set_io_throttle",
+ "arguments": {
+ "device": "virtio0",
+ "iops": 100,
+ "iops_rd": 0,
+ "iops_wr": 0,
+ "bps": 0,
+ "bps_rd": 0,
+ "bps_wr": 0
+ }
+ }
+
+
+I/O bursts
+----------
+In addition to the basic limits we have just seen, QEMU allows the
+user to do bursts of I/O for a configurable amount of time. A burst is
+an amount of I/O that can exceed the basic limit. Bursts are useful to
+allow better performance when there are peaks of activity (the OS
+boots, a service needs to be restarted) while keeping the average
+limits lower the rest of the time.
+
+Two parameters control bursts: their length and the maximum amount of
+I/O they allow. These two can be configured separately for each one of
+the six basic parameters described in the previous section, but in
+this section we'll use 'iops-total' as an example.
+
+The I/O limit during bursts is set using 'iops-total-max', and the
+maximum length (in seconds) is set with 'iops-total-max-length'. So if
+we want to configure a drive with a basic limit of 100 IOPS and allow
+bursts of 2000 IOPS for 60 seconds, we would do it like this (the line
+is split for clarity):
+
+ -drive file=hd0.qcow2,
+ throttling.iops-total=100,
+ throttling.iops-total-max=2000,
+ throttling.iops-total-max-length=60
+
+Or, with QMP:
+
+ { "execute": "block_set_io_throttle",
+ "arguments": {
+ "device": "virtio0",
+ "iops": 100,
+ "iops_rd": 0,
+ "iops_wr": 0,
+ "bps": 0,
+ "bps_rd": 0,
+ "bps_wr": 0,
+ "iops_max": 2000,
+ "iops_max_length": 60,
+ }
+ }
+
+With this, the user can perform I/O on hd0.qcow2 at a rate of 2000
+IOPS for 1 minute before it's throttled down to 100 IOPS.
+
+The user will be able to do bursts again if there's a sufficiently
+long period of time with unused I/O (see below for details).
+
+The default value for 'iops-total-max' is 0 and it means that bursts
+are not allowed. 'iops-total-max-length' can only be set if
+'iops-total-max' is set as well, and its default value is 1 second.
+
+Here's the complete list of parameters for configuring bursts:
+
+|----------------------------------+-----------------------|
+| -drive | block_set_io_throttle |
+|----------------------------------+-----------------------|
+| throttling.iops-total-max | iops_max |
+| throttling.iops-total-max-length | iops_max_length |
+| throttling.iops-read-max | iops_rd_max |
+| throttling.iops-read-max-length | iops_rd_max_length |
+| throttling.iops-write-max | iops_wr_max |
+| throttling.iops-write-max-length | iops_wr_max_length |
+| throttling.bps-total-max | bps_max |
+| throttling.bps-total-max-length | bps_max_length |
+| throttling.bps-read-max | bps_rd_max |
+| throttling.bps-read-max-length | bps_rd_max_length |
+| throttling.bps-write-max | bps_wr_max |
+| throttling.bps-write-max-length | bps_wr_max_length |
+|----------------------------------+-----------------------|
+
+
+Controlling the size of I/O operations
+--------------------------------------
+When applying IOPS limits all I/O operations are treated equally
+regardless of their size. This means that the user can take advantage
+of this in order to circumvent the limits and submit one huge I/O
+request instead of several smaller ones.
+
+QEMU provides a setting called throttling.iops-size to prevent this
+from happening. This setting specifies the size (in bytes) of an I/O
+request for accounting purposes. Larger requests will be counted
+proportionally to this size.
+
+For example, if iops-size is set to 4096 then an 8KB request will be
+counted as two, and a 6KB request will be counted as one and a
+half. This only applies to requests larger than iops-size: smaller
+requests will be always counted as one, no matter their size.
+
+The default value of iops-size is 0 and it means that the size of the
+requests is never taken into account when applying IOPS limits.
+
+
+Applying I/O limits to groups of disks
+--------------------------------------
+In all the examples so far we have seen how to apply limits to the I/O
+performed on individual drives, but QEMU allows grouping drives so
+they all share the same limits.
+
+The way it works is that each drive with I/O limits is assigned to a
+group named using the throttling.group parameter. If this parameter is
+not specified, then the device name (i.e. 'virtio0', 'ide0-hd0') will
+be used as the group name.
+
+Limits set using the throttling.* parameters discussed earlier in this
+document apply to the combined I/O of all members of a group.
+
+Consider this example:
+
+ -drive file=hd1.qcow2,throttling.iops-total=6000,throttling.group=foo
+ -drive file=hd2.qcow2,throttling.iops-total=6000,throttling.group=foo
+ -drive file=hd3.qcow2,throttling.iops-total=3000,throttling.group=bar
+ -drive file=hd4.qcow2,throttling.iops-total=6000,throttling.group=foo
+ -drive file=hd5.qcow2,throttling.iops-total=3000,throttling.group=bar
+ -drive file=hd6.qcow2,throttling.iops-total=5000
+
+Here hd1, hd2 and hd4 are all members of a group named 'foo' with a
+combined IOPS limit of 6000, and hd3 and hd5 are members of 'bar'. hd6
+is left alone (technically it is part of a 1-member group).
+
+Limits are applied in a round-robin fashion so if there are concurrent
+I/O requests on several drives of the same group they will be
+distributed evenly.
+
+When I/O limits are applied to an existing drive using the QMP command
+'block_set_io_throttle', the following things need to be taken into
+account:
+
+ - I/O limits are shared within the same group, so new values will
+ affect all members and overwrite the previous settings. In other
+ words: if different limits are applied to members of the same
+ group, the last one wins.
+
+ - If 'group' is unset it is assumed to be the current group of that
+ drive. If the drive is not in a group yet, it will be added to a
+ group named after the device name.
+
+ - If 'group' is set then the drive will be moved to that group if
+ it was member of a different one. In this case the limits
+ specified in the parameters will be applied to the new group
+ only.
+
+ - I/O limits can be disabled by setting all of them to 0. In this
+ case the device will be removed from its group and the rest of
+ its members will not be affected. The 'group' parameter is
+ ignored.
+
+
+The Leaky Bucket algorithm
+--------------------------
+I/O limits in QEMU are implemented using the leaky bucket algorithm
+(specifically the "Leaky bucket as a meter" variant).
+
+This algorithm uses the analogy of a bucket that leaks water
+constantly. The water that gets into the bucket represents the I/O
+that has been performed, and no more I/O is allowed once the bucket is
+full.
+
+To see the way this corresponds to the throttling parameters in QEMU,
+consider the following values:
+
+ iops-total=100
+ iops-total-max=2000
+ iops-total-max-length=60
+
+ - Water leaks from the bucket at a rate of 100 IOPS.
+ - Water can be added to the bucket at a rate of 2000 IOPS.
+ - The size of the bucket is 2000 x 60 = 120000
+ - If 'iops-total-max-length' is unset then the bucket size is 100.
+
+The bucket is initially empty, therefore water can be added until it's
+full at a rate of 2000 IOPS (the burst rate). Once the bucket is full
+we can only add as much water as it leaks, therefore the I/O rate is
+reduced to 100 IOPS. If we add less water than it leaks then the
+bucket will start to empty, allowing for bursts again.
+
+Note that since water is leaking from the bucket even during bursts,
+it will take a bit more than 60 seconds at 2000 IOPS to fill it
+up. After those 60 seconds the bucket will have leaked 60 x 100 =
+6000, allowing for 3 more seconds of I/O at 2000 IOPS.
+
+Also, due to the way the algorithm works, longer burst can be done at
+a lower I/O rate, e.g. 1000 IOPS during 120 seconds.