1 files changed, 56 insertions, 43 deletions
diff --git a/bip-0158.mediawiki b/bip-0158.mediawiki
index 15caa68..bf2e856 100644
--- a/bip-0158.mediawiki
+++ b/bip-0158.mediawiki
@@ -65,11 +65,14 @@ For each block, compact filters are derived containing sets of items associated
 with the block (eg. addresses sent to, outpoints spent, etc.). A set of such
 data objects is compressed into a probabilistic structure called a
 ''Golomb-coded set'' (GCS), which matches all items in the set with probability
-1, and matches other items with probability <code>2^(-P)</code> for some integer
-parameter <code>P</code>.
+1, and matches other items with probability <code>2^(-P)</code> for some
+integer parameter <code>P</code>. We also introduce parameter <code>M</code>
+which allows filter to uniquely tune the range that items are hashed onto
+before compressing. Each defined filter also selects distinct parameters for P
+and M.
 
 At a high level, a GCS is constructed from a set of <code>N</code> items by:
-# hashing all items to 64-bit integers in the range <code>[0, N * 2^P)</code>
+# hashing all items to 64-bit integers in the range <code>[0, N * M)</code>
 # sorting the hashed values in ascending order
 # computing the differences between each value and the previous one
 # writing the differences sequentially, compressed with Golomb-Rice coding
@@ -80,9 +83,13 @@ The following sections describe each step in greater detail.
 
 The first step in the filter construction is hashing the variable-sized raw
 items in the set to the range <code>[0, F)</code>, where <code>F = N *
-2^P</code>. Set membership queries against the hash outputs will have a false
-positive rate of <code>2^(-P)</code>. To avoid integer overflow, the number of
-items <code>N</code> MUST be <2^32 and <code>P</code> MUST be <=32.
+M</code>. Customarily, <code>M</code> is set to <code>2^P</code>. However, if
+one is able to select both Parameters independently, then more optimal values
+can be
+selected<ref>https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845</ref>.
+Set membership queries against the hash outputs will have a false positive rate
+of <code>2^(-P)</code>. To avoid integer overflow, the
+number of items <code>N</code> MUST be <2^32 and <code>M</code> MUST be <2^32.
 
 The items are first passed through the pseudorandom function ''SipHash'', which
 takes a 128-bit key <code>k</code> and a variable-sized byte vector and produces
@@ -104,9 +111,9 @@ result.
 hash_to_range(item: []byte, F: uint64, k: [16]byte) -> uint64:
     return (siphash(k, item) * F) >> 64
 
-hashed_set_construct(raw_items: [][]byte, P: uint, k: [16]byte) -> []uint64:
+hashed_set_construct(raw_items: [][]byte, k: [16]byte, M: uint) -> []uint64:
     let N = len(raw_items)
-    let F = N << P
+    let F = N * M
 
     let set_items = []
 
@@ -197,8 +204,8 @@ with Golomb-Rice coding. Finally, the bit stream is padded with 0's to the
 nearest byte boundary and serialized to the output byte vector.
 
 <pre>
-construct_gcs(L: [][]byte, P: uint, k: [16]byte) -> []byte:
-    let set_items = hashed_set_construct(L, P, k)
+construct_gcs(L: [][]byte, P: uint, k: [16]byte, M: uint) -> []byte:
+    let set_items = hashed_set_construct(L, k, M)
 
     set_items.sort()
 
@@ -224,8 +231,8 @@ against the reconstructed values. Note that querying does not require the entire
 decompressed set be held in memory at once.
 
 <pre>
-gcs_match(key: [16]byte, compressed_set: []byte, target: []byte, P: uint, N: uint) -> bool:
-    let F = N << P
+gcs_match(key: [16]byte, compressed_set: []byte, target: []byte, P: uint, N: uint, M: uint) -> bool:
+    let F = N * M
     let target_hash = hash_to_range(target, F, k)
 
     stream = new_bit_stream(compressed_set)
@@ -258,49 +265,54 @@ against the decompressed GCS contents. See
 
 === Block Filters ===
 
-This BIP defines two initial filter types:
+This BIP defines one initial filter type:
 * Basic (<code>0x00</code>)
-* Extended (<code>0x01</code>)
+  * <code>M = 784931</code>
+  * <code>P = 19</code>
 
 ==== Contents ====
 
 The basic filter is designed to contain everything that a light client needs to
-sync a regular Bitcoin wallet. A basic filter MUST contain exactly the following
-items for each transaction in a block:
-* The outpoint of each input, except for the coinbase transaction
-* The scriptPubKey of each output
-* The <code>txid</code> of the transaction itself
-
-The extended filter contains extra data that is meant to enable applications
-with more advanced smart contracts. An extended filter MUST contain exactly the
-following items for each transaction in a block ''except the coinbase'':
-* Each item within the witness stack of each input (if the input has a witness)
-* Each data push in the scriptSig of each input
-
-Note that neither filter type interprets P2SH scripts or witness scripts to
-extract data pushes from them. If necessary, future filter types may be designed
-to do so.
+sync a regular Bitcoin wallet. A basic filter MUST contain exactly the
+following items for each transaction in a block:
+* The previous output script (the script being spent) for each input, except
+  for the coinbase transaction.
+* The scriptPubKey of each output, aside from all <code>OP_RETURN</code> output
+  scripts.
+
+Any "nil" items MUST NOT be included into the final set of filter elements. 
+
+We exclude all <code>OP_RETURN</code> outputs in order to allow filters to
+easily be committed to in the future via a soft-fork. A likely area for future
+commitments is an additional <code>OP_RETURN</code> output in the coinbase
+transaction similar to the current witness commitment
+<ref>https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki</rev>. By
+excluding all <code>OP_RETURN</code> outputs we avoid a circular dependency
+between the commitment, and the item being committed to.
 
 ==== Construction ====
 
-Both the basic and extended filter types are constructed as Golomb-coded sets
-with the following parameters.
+The basic type is constructed as Golomb-coded sets with the following
+parameters.
 
-The parameter <code>P</code> MUST be set to <code>20</code>. This value was
-chosen as simulations show that it minimizes the bandwidth utilized, considering
-both the expected number of blocks downloaded due to false positives and the
-size of the filters themselves. The code along with a demo used for the
-parameter tuning can be found
-[https://github.com/Roasbeef/bips/blob/83b83c78e189be898573e0bfe936dd0c9b99ecb9/gcs_light_client/gentestvectors.go here].
+The parameter <code>P</code> MUST be set to <code>19</code>, and the parameter
+<code>M</code> MUST be set to <code>784931</code>. Analysis has shown that if
+one is able to select <code>P</code> and <code>M</code> independently, then
+setting <code>M=1.497137 * 2^P</code> is close to optimal
+<ref>https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845</ref>.
+
+Empirical analysis also shows that was chosen as these parameters minimize the
+bandwidth utilized, considering both the expected number of blocks downloaded
+due to false positives and the size of the filters themselves. 
 
 The parameter <code>k</code> MUST be set to the first 16 bytes of the hash of
 the block for which the filter is constructed. This ensures the key is
 deterministic while still varying from block to block.
 
 Since the value <code>N</code> is required to decode a GCS, a serialized GCS
-includes it as a prefix, written as a CompactSize. Thus, the complete
-serialization of a filter is:
-* <code>N</code>, encoded as a CompactSize
+includes it as a prefix, written as a <code>CompactSize</code>. Thus, the
+complete serialization of a filter is:
+* <code>N</code>, encoded as a <code>CompactSize</code>
 * The bytes of the compressed filter itself
 
 ==== Signaling ====
@@ -323,7 +335,8 @@ though it requires implementation of the new filters.
 
 We would like to thank bfd (from the bitcoin-dev mailing list) for bringing the
 basis of this BIP to our attention, Greg Maxwell for pointing us in the
-direction of Golomb-Rice coding and fast range optimization, and Pedro
+direction of Golomb-Rice coding and fast range optimization, Pieter Wullie for
+his analysis of optimal GCS parameters, and Pedro
 Martelletto for writing the initial indexing code for <code>btcd</code>.
 
 We would also like to thank Dave Collins, JJ Jeffrey, and Eric Lombrozo for
@@ -375,8 +388,8 @@ easier to understand.
 === Golomb-Coded Set Multi-Match ===
 
 <pre>
-gcs_match_any(key: [16]byte, compressed_set: []byte, targets: [][]byte, P: uint, N: uint) -> bool:
-    let F = N << P
+gcs_match_any(key: [16]byte, compressed_set: []byte, targets: [][]byte, P: uint, N: uint, M: uint) -> bool:
+    let F = N * M
 
     // Map targets to the same range as the set hashes.
     let target_hashes = []