aboutsummaryrefslogtreecommitdiff
path: root/system/bees/README
blob: 76d510e184a160d3fa48367eed6b7bd88b3e6b83 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
bees (Best-Effort Extent-Same) is a block-oriented userspace
deduplication agent designed to scale up to large btrfs filesystems.
It is an offline dedupe combined with an incremental data scan
capability to minimize time data spends on disk from write to dedupe.

Strengths:
  * Space-efficient hash table - can use as little as 1 GB hash table
    per 10 TB unique data (0.1GB/TB)
  * Daemon mode - incrementally dedupes new data as it appears
  * Largest extents first - recover more free space during fixed
    maintenance windows
  * Works with btrfs compression - dedupe any combination of compressed
    and uncompressed files
  * Whole-filesystem dedupe - scans data only once, even with snapshots
    and reflinks
  * Persistent hash table for rapid restart after shutdown
  * Constant hash table size - no increased RAM usage if data set
    becomes larger
  * Works on live data - no scheduled downtime required
  * Automatic self-throttling - reduces system load
  * btrfs support - recovers more free space from btrfs than naive
    dedupers

Weaknesses:
  * Whole-filesystem dedupe - has no include/exclude filters, does not
    accept file lists
  * Requires root privilege (`CAP_SYS_ADMIN` plus the usual filesystem
    read/modify caps)
  * First run may increase metadata space usage if many snapshots exist
  * Constant hash table size - no decreased RAM usage if data set
    becomes smaller
  * btrfs only

After installing, edit /etc/rc.d/rc.bees.conf, /etc/logrotate.d/bees,
and /etc/bees/*.conf, and ensure /etc/rc.d/rc.bees is started from
/etc/rc.d/rc.local. To drastically reduce the amount of logging it is
recommended to add "-v 6" to OPTIONS in /etc/bees/*.conf.