diff options
author | Wladimir J. van der Laan <laanwj@protonmail.com> | 2020-11-19 11:44:25 +0100 |
---|---|---|
committer | Wladimir J. van der Laan <laanwj@protonmail.com> | 2020-11-19 11:44:29 +0100 |
commit | e12ad7f383462c51af3d41a83f37c5e5a4c4dabc (patch) | |
tree | 57e0f7176a876b80b3900b36a9d574765f54eb18 /src | |
parent | 47b6ad837c07da6c6eb35fabbb45d5666f712ea0 (diff) | |
parent | d9141a0002bb508b2e94e206a1bd28ef8f97ffde (diff) |
Merge #19968: doc: clarify CRollingBloomFilter size estimate
d9141a0002bb508b2e94e206a1bd28ef8f97ffde doc: clarify CRollingBloomFilter size estimate (Anthony Towns)
Pull request description:
Based on #19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`:
- Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
- Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
- Reconcile the newly added code with the existing approximation
ACKs for top commit:
laanwj:
ACK d9141a0002bb508b2e94e206a1bd28ef8f97ffde
Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
Diffstat (limited to 'src')
-rw-r--r-- | src/bloom.h | 13 |
1 files changed, 12 insertions, 1 deletions
diff --git a/src/bloom.h b/src/bloom.h index 9307257852..24dc607cd9 100644 --- a/src/bloom.h +++ b/src/bloom.h @@ -94,7 +94,18 @@ public: * insert()'ed ... but may also return true for items that were not inserted. * * It needs around 1.8 bytes per element per factor 0.1 of false positive rate. - * (More accurately: 3/(log(256)*log(2)) * log(1/fpRate) * nElements bytes) + * For example, if we want 1000 elements, we'd need: + * - ~1800 bytes for a false positive rate of 0.1 + * - ~3600 bytes for a false positive rate of 0.01 + * - ~5400 bytes for a false positive rate of 0.001 + * + * If we make these simplifying assumptions: + * - logFpRate / log(0.5) doesn't get rounded or clamped in the nHashFuncs calculation + * - nElements is even, so that nEntriesPerGeneration == nElements / 2 + * + * Then we get a more accurate estimate for filter bytes: + * + * 3/(log(256)*log(2)) * log(1/fpRate) * nElements */ class CRollingBloomFilter { |