aboutsummaryrefslogtreecommitdiff
path: root/src/leveldb/doc/log_format.txt
diff options
context:
space:
mode:
Diffstat (limited to 'src/leveldb/doc/log_format.txt')
-rw-r--r--src/leveldb/doc/log_format.txt75
1 files changed, 0 insertions, 75 deletions
diff --git a/src/leveldb/doc/log_format.txt b/src/leveldb/doc/log_format.txt
deleted file mode 100644
index 4cca5ef6ea..0000000000
--- a/src/leveldb/doc/log_format.txt
+++ /dev/null
@@ -1,75 +0,0 @@
-The log file contents are a sequence of 32KB blocks. The only
-exception is that the tail of the file may contain a partial block.
-
-Each block consists of a sequence of records:
- block := record* trailer?
- record :=
- checksum: uint32 // crc32c of type and data[] ; little-endian
- length: uint16 // little-endian
- type: uint8 // One of FULL, FIRST, MIDDLE, LAST
- data: uint8[length]
-
-A record never starts within the last six bytes of a block (since it
-won't fit). Any leftover bytes here form the trailer, which must
-consist entirely of zero bytes and must be skipped by readers.
-
-Aside: if exactly seven bytes are left in the current block, and a new
-non-zero length record is added, the writer must emit a FIRST record
-(which contains zero bytes of user data) to fill up the trailing seven
-bytes of the block and then emit all of the user data in subsequent
-blocks.
-
-More types may be added in the future. Some Readers may skip record
-types they do not understand, others may report that some data was
-skipped.
-
-FULL == 1
-FIRST == 2
-MIDDLE == 3
-LAST == 4
-
-The FULL record contains the contents of an entire user record.
-
-FIRST, MIDDLE, LAST are types used for user records that have been
-split into multiple fragments (typically because of block boundaries).
-FIRST is the type of the first fragment of a user record, LAST is the
-type of the last fragment of a user record, and MIDDLE is the type of
-all interior fragments of a user record.
-
-Example: consider a sequence of user records:
- A: length 1000
- B: length 97270
- C: length 8000
-A will be stored as a FULL record in the first block.
-
-B will be split into three fragments: first fragment occupies the rest
-of the first block, second fragment occupies the entirety of the
-second block, and the third fragment occupies a prefix of the third
-block. This will leave six bytes free in the third block, which will
-be left empty as the trailer.
-
-C will be stored as a FULL record in the fourth block.
-
-===================
-
-Some benefits over the recordio format:
-
-(1) We do not need any heuristics for resyncing - just go to next
-block boundary and scan. If there is a corruption, skip to the next
-block. As a side-benefit, we do not get confused when part of the
-contents of one log file are embedded as a record inside another log
-file.
-
-(2) Splitting at approximate boundaries (e.g., for mapreduce) is
-simple: find the next block boundary and skip records until we
-hit a FULL or FIRST record.
-
-(3) We do not need extra buffering for large records.
-
-Some downsides compared to recordio format:
-
-(1) No packing of tiny records. This could be fixed by adding a new
-record type, so it is a shortcoming of the current implementation,
-not necessarily the format.
-
-(2) No compression. Again, this could be fixed by adding new record types.