aboutsummaryrefslogtreecommitdiff
path: root/src
diff options
context:
space:
mode:
authorWladimir J. van der Laan <laanwj@protonmail.com>2020-11-19 11:44:25 +0100
committerWladimir J. van der Laan <laanwj@protonmail.com>2020-11-19 11:44:29 +0100
commite12ad7f383462c51af3d41a83f37c5e5a4c4dabc (patch)
tree57e0f7176a876b80b3900b36a9d574765f54eb18 /src
parent47b6ad837c07da6c6eb35fabbb45d5666f712ea0 (diff)
parentd9141a0002bb508b2e94e206a1bd28ef8f97ffde (diff)
Merge #19968: doc: clarify CRollingBloomFilter size estimate
d9141a0002bb508b2e94e206a1bd28ef8f97ffde doc: clarify CRollingBloomFilter size estimate (Anthony Towns) Pull request description: Based on #19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`: - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate" - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter) - Reconcile the newly added code with the existing approximation ACKs for top commit: laanwj: ACK d9141a0002bb508b2e94e206a1bd28ef8f97ffde Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
Diffstat (limited to 'src')
-rw-r--r--src/bloom.h13
1 files changed, 12 insertions, 1 deletions
diff --git a/src/bloom.h b/src/bloom.h
index 9307257852..24dc607cd9 100644
--- a/src/bloom.h
+++ b/src/bloom.h
@@ -94,7 +94,18 @@ public:
* insert()'ed ... but may also return true for items that were not inserted.
*
* It needs around 1.8 bytes per element per factor 0.1 of false positive rate.
- * (More accurately: 3/(log(256)*log(2)) * log(1/fpRate) * nElements bytes)
+ * For example, if we want 1000 elements, we'd need:
+ * - ~1800 bytes for a false positive rate of 0.1
+ * - ~3600 bytes for a false positive rate of 0.01
+ * - ~5400 bytes for a false positive rate of 0.001
+ *
+ * If we make these simplifying assumptions:
+ * - logFpRate / log(0.5) doesn't get rounded or clamped in the nHashFuncs calculation
+ * - nElements is even, so that nEntriesPerGeneration == nElements / 2
+ *
+ * Then we get a more accurate estimate for filter bytes:
+ *
+ * 3/(log(256)*log(2)) * log(1/fpRate) * nElements
*/
class CRollingBloomFilter
{