aboutsummaryrefslogtreecommitdiff
path: root/doc/table_format.md
diff options
context:
space:
mode:
authorPieter Wuille <pieter.wuille@gmail.com>2017-06-09 19:24:30 -0700
committerPieter Wuille <pieter.wuille@gmail.com>2017-06-09 19:24:30 -0700
commitcf44e4ca7762742c6c3154447b40869ec9d041db (patch)
treed5a89851da0a8ab07c40966ac137f4e299c80927 /doc/table_format.md
parent634ad517037b319147816f1d112b066528e1724a (diff)
downloadbitcoin-cf44e4ca7762742c6c3154447b40869ec9d041db.tar.xz
Squashed 'src/leveldb/' changes from a31c8aa40..196962ff0
196962ff0 Add AcceleratedCRC32C to port_win.h 1bdf1c34c Merge upstream LevelDB v1.20 d31721eb0 Merge #17: Fixed file sharing errors fecd44902 Fixed file sharing error in Win32Env::GetFileSize(), Win32SequentialFile::_Init(), Win32RandomAccessFile::_Init() Fixed error checking in Win32SequentialFile::_Init() 5b7510f1b Merge #14: Merge upstream LevelDB 1.19 0d969fd57 Merge #16: [LevelDB] Do no crash if filesystem can't fsync c8c029b5b [LevelDB] Do no crash if filesystem can't fsync a53934a3a Increase leveldb version to 1.20. f3f139737 Separate Env tests from PosixEnv tests. eb4f0972f leveldb: Fix compilation warnings in port_posix_sse.cc on x86 (32-bit). d0883b600 Fixed path to doc file: index.md. 7fa20948d Convert documentation to markdown. ea175e28f Implement support for Intel crc32 instruction (SSE 4.2) 95cd743e5 Including <limits> for std::numeric_limits. 646c3588d Limit the number of read-only files the POSIX Env will have open. d40bc3fa5 Merge #13: Typo ebbd772d3 Typo a2fb086d0 Add option for max file size. The currend hard-coded value of 2M is inefficient in colossus. git-subtree-dir: src/leveldb git-subtree-split: 196962ff01c39b4705d8117df5c3f8c205349950
Diffstat (limited to 'doc/table_format.md')
-rw-r--r--doc/table_format.md107
1 files changed, 107 insertions, 0 deletions
diff --git a/doc/table_format.md b/doc/table_format.md
new file mode 100644
index 0000000000..5fe7e72411
--- /dev/null
+++ b/doc/table_format.md
@@ -0,0 +1,107 @@
+leveldb File format
+===================
+
+ <beginning_of_file>
+ [data block 1]
+ [data block 2]
+ ...
+ [data block N]
+ [meta block 1]
+ ...
+ [meta block K]
+ [metaindex block]
+ [index block]
+ [Footer] (fixed size; starts at file_size - sizeof(Footer))
+ <end_of_file>
+
+The file contains internal pointers. Each such pointer is called
+a BlockHandle and contains the following information:
+
+ offset: varint64
+ size: varint64
+
+See [varints](https://developers.google.com/protocol-buffers/docs/encoding#varints)
+for an explanation of varint64 format.
+
+1. The sequence of key/value pairs in the file are stored in sorted
+order and partitioned into a sequence of data blocks. These blocks
+come one after another at the beginning of the file. Each data block
+is formatted according to the code in `block_builder.cc`, and then
+optionally compressed.
+
+2. After the data blocks we store a bunch of meta blocks. The
+supported meta block types are described below. More meta block types
+may be added in the future. Each meta block is again formatted using
+`block_builder.cc` and then optionally compressed.
+
+3. A "metaindex" block. It contains one entry for every other meta
+block where the key is the name of the meta block and the value is a
+BlockHandle pointing to that meta block.
+
+4. An "index" block. This block contains one entry per data block,
+where the key is a string >= last key in that data block and before
+the first key in the successive data block. The value is the
+BlockHandle for the data block.
+
+5. At the very end of the file is a fixed length footer that contains
+the BlockHandle of the metaindex and index blocks as well as a magic number.
+
+ metaindex_handle: char[p]; // Block handle for metaindex
+ index_handle: char[q]; // Block handle for index
+ padding: char[40-p-q];// zeroed bytes to make fixed length
+ // (40==2*BlockHandle::kMaxEncodedLength)
+ magic: fixed64; // == 0xdb4775248b80fb57 (little-endian)
+
+## "filter" Meta Block
+
+If a `FilterPolicy` was specified when the database was opened, a
+filter block is stored in each table. The "metaindex" block contains
+an entry that maps from `filter.<N>` to the BlockHandle for the filter
+block where `<N>` is the string returned by the filter policy's
+`Name()` method.
+
+The filter block stores a sequence of filters, where filter i contains
+the output of `FilterPolicy::CreateFilter()` on all keys that are stored
+in a block whose file offset falls within the range
+
+ [ i*base ... (i+1)*base-1 ]
+
+Currently, "base" is 2KB. So for example, if blocks X and Y start in
+the range `[ 0KB .. 2KB-1 ]`, all of the keys in X and Y will be
+converted to a filter by calling `FilterPolicy::CreateFilter()`, and the
+resulting filter will be stored as the first filter in the filter
+block.
+
+The filter block is formatted as follows:
+
+ [filter 0]
+ [filter 1]
+ [filter 2]
+ ...
+ [filter N-1]
+
+ [offset of filter 0] : 4 bytes
+ [offset of filter 1] : 4 bytes
+ [offset of filter 2] : 4 bytes
+ ...
+ [offset of filter N-1] : 4 bytes
+
+ [offset of beginning of offset array] : 4 bytes
+ lg(base) : 1 byte
+
+The offset array at the end of the filter block allows efficient
+mapping from a data block offset to the corresponding filter.
+
+## "stats" Meta Block
+
+This meta block contains a bunch of stats. The key is the name
+of the statistic. The value contains the statistic.
+
+TODO(postrelease): record following stats.
+
+ data size
+ index size
+ key size (uncompressed)
+ value size (uncompressed)
+ number of entries
+ number of data blocks