1 files changed, 68 insertions, 56 deletions
diff --git a/bip-0158.mediawiki b/bip-0158.mediawiki
index dc28154..ad46da6 100644
--- a/bip-0158.mediawiki
+++ b/bip-0158.mediawiki
@@ -27,9 +27,8 @@ enables basic wallets and applications with more advanced smart contracts.
 [[bip-0157.mediawiki|BIP 157]] defines a light client protocol based on
 deterministic filters of block content. The filters are designed to
 minimize the expected bandwidth consumed by light clients, downloading filters
-and full blocks. This document defines two initial filter types, ''basic'' and
-''extended'', to provide support for advanced applications while reducing the
-filter size for regular wallets.
+and full blocks. This document defines the initial filter type ''basic''
+that is designed to reduce the filter size for regular wallets.
 
 == Definitions ==
 
@@ -50,8 +49,7 @@ following functions are used in the pseudocode in this document:
 * <code>write_bit(stream, b)</code> appends the bit <code>b</code> to the end of the stream
 * <code>read_bit(stream)</code> reads the next available bit from the stream
 * <code>write_bits_big_endian(stream, n, k)</code> appends the <code>k</code> least significant bits of integer <code>n</code> to the end of the stream in big-endian bit order
-* <code>read_bits_big_endian(stream, k)</code> reads the next available
-* <code>k</code> bits from the stream and interprets them as the least significant bits of a big-endian integer
+* <code>read_bits_big_endian(stream, k)</code> reads the next available <code>k</code> bits from the stream and interprets them as the least significant bits of a big-endian integer
 
 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
 "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
@@ -65,11 +63,13 @@ For each block, compact filters are derived containing sets of items associated
 with the block (eg. addresses sent to, outpoints spent, etc.). A set of such
 data objects is compressed into a probabilistic structure called a
 ''Golomb-coded set'' (GCS), which matches all items in the set with probability
-1, and matches other items with probability <code>2^(-P)</code> for some integer
-parameter <code>P</code>.
+1, and matches other items with probability <code>1/M</code> for some
+integer parameter <code>M</code>. The encoding is also parameterized by
+<code>P</code>, the bit length of the remainder code. Each filter defined
+specifies values for <code>P</code> and <code>M</code>.
 
 At a high level, a GCS is constructed from a set of <code>N</code> items by:
-# hashing all items to 64-bit integers in the range <code>[0, N * 2^P)</code>
+# hashing all items to 64-bit integers in the range <code>[0, N * M)</code>
 # sorting the hashed values in ascending order
 # computing the differences between each value and the previous one
 # writing the differences sequentially, compressed with Golomb-Rice coding
@@ -80,9 +80,13 @@ The following sections describe each step in greater detail.
 
 The first step in the filter construction is hashing the variable-sized raw
 items in the set to the range <code>[0, F)</code>, where <code>F = N *
-2^P</code>. Set membership queries against the hash outputs will have a false
-positive rate of <code>2^(-P)</code>. To avoid integer overflow, the number of
-items <code>N</code> MUST be <2^32 and <code>P</code> MUST be <=32.
+M</code>. Customarily, <code>M</code> is set to <code>2^P</code>. However, if
+one is able to select both Parameters independently, then more optimal values
+can be
+selected<ref>https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845</ref>.
+Set membership queries against the hash outputs will have a false positive rate
+of <code>M</code>. To avoid integer overflow, the number of items <code>N</code>
+MUST be <2^32 and <code>M</code> MUST be <2^32.
 
 The items are first passed through the pseudorandom function ''SipHash'', which
 takes a 128-bit key <code>k</code> and a variable-sized byte vector and produces
@@ -104,9 +108,9 @@ result.
 hash_to_range(item: []byte, F: uint64, k: [16]byte) -> uint64:
     return (siphash(k, item) * F) >> 64
 
-hashed_set_construct(raw_items: [][]byte, P: uint, k: [16]byte) -> []uint64:
+hashed_set_construct(raw_items: [][]byte, k: [16]byte, M: uint) -> []uint64:
     let N = len(raw_items)
-    let F = N << P
+    let F = N * M
 
     let set_items = []
 
@@ -182,9 +186,10 @@ golomb_decode(stream, P: uint) -> uint64:
 
 ==== Set Construction ====
 
-A GCS is constructed from three parameters:
+A GCS is constructed from four parameters:
 * <code>L</code>, a vector of <code>N</code> raw items
-* <code>P</code>, which determines the false positive rate
+* <code>P</code>, the bit parameter of the Golomb-Rice coding
+* <code>M</code>, the target false positive rate
 * <code>k</code>, the 128-bit key used to randomize the SipHash outputs
 
 The result is a byte vector with a minimum size of <code>N * (P + 1)</code>
@@ -197,8 +202,8 @@ with Golomb-Rice coding. Finally, the bit stream is padded with 0's to the
 nearest byte boundary and serialized to the output byte vector.
 
 <pre>
-construct_gcs(L: [][]byte, P: uint, k: [16]byte) -> []byte:
-    let set_items = hashed_set_construct(L, P, k)
+construct_gcs(L: [][]byte, P: uint, k: [16]byte, M: uint) -> []byte:
+    let set_items = hashed_set_construct(L, k, M)
 
     set_items.sort()
 
@@ -224,8 +229,8 @@ against the reconstructed values. Note that querying does not require the entire
 decompressed set be held in memory at once.
 
 <pre>
-gcs_match(key: [16]byte, compressed_set: []byte, target: []byte, P: uint, N: uint) -> bool:
-    let F = N << P
+gcs_match(key: [16]byte, compressed_set: []byte, target: []byte, P: uint, N: uint, M: uint) -> bool:
+    let F = N * M
     let target_hash = hash_to_range(target, F, k)
 
     stream = new_bit_stream(compressed_set)
@@ -258,49 +263,55 @@ against the decompressed GCS contents. See
 
 === Block Filters ===
 
-This BIP defines two initial filter types:
+This BIP defines one initial filter type:
 * Basic (<code>0x00</code>)
-* Extended (<code>0x01</code>)
+** <code>M = 784931</code>
+** <code>P = 19</code>
 
 ==== Contents ====
 
 The basic filter is designed to contain everything that a light client needs to
-sync a regular Bitcoin wallet. A basic filter MUST contain exactly the following
-items for each transaction in a block:
-* The outpoint of each input, except for the coinbase transaction
-* Each data push in the scriptPubKey of each output, ''only if'' the scriptPubKey is parseable
-* The <code>txid</code> of the transaction itself
-
-The extended filter contains extra data that is meant to enable applications
-with more advanced smart contracts. An extended filter MUST contain exactly the
-following items for each transaction in a block ''except the coinbase'':
-* Each item within the witness stack of each input (if the input has a witness)
-* Each data push in the scriptSig of each input
-
-Note that neither filter type interprets P2SH scripts or witness scripts to
-extract data pushes from them. If necessary, future filter types may be designed
-to do so.
+sync a regular Bitcoin wallet. A basic filter MUST contain exactly the
+following items for each transaction in a block:
+* The previous output script (the script being spent) for each input, except
+  for the coinbase transaction.
+* The scriptPubKey of each output, aside from all <code>OP_RETURN</code> output
+  scripts.
+
+Any "nil" items MUST NOT be included into the final set of filter elements.
+
+We exclude all outputs that start with <code>OP_RETURN</code> in order to allow
+filters to easily be committed to in the future via a soft-fork. A likely area
+for future commitments is an additional <code>OP_RETURN</code> output in the
+coinbase transaction similar to the current witness commitment
+<ref>https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki</rev>. By
+excluding all <code>OP_RETURN</code> outputs we avoid a circular dependency
+between the commitment, and the item being committed to.
 
 ==== Construction ====
 
-Both the basic and extended filter types are constructed as Golomb-coded sets
-with the following parameters.
+The basic type is constructed as Golomb-coded sets with the following
+parameters.
 
-The parameter <code>P</code> MUST be set to <code>20</code>. This value was
-chosen as simulations show that it minimizes the bandwidth utilized, considering
-both the expected number of blocks downloaded due to false positives and the
-size of the filters themselves. The code along with a demo used for the
-parameter tuning can be found
-[https://github.com/Roasbeef/bips/blob/83b83c78e189be898573e0bfe936dd0c9b99ecb9/gcs_light_client/gentestvectors.go here].
+The parameter <code>P</code> MUST be set to <code>19</code>, and the parameter
+<code>M</code> MUST be set to <code>784931</code>. Analysis has shown that if
+one is able to select <code>P</code> and <code>M</code> independently, then
+setting <code>M=1.497137 * 2^P</code> is close to optimal
+<ref>https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845</ref>.
 
-The parameter <code>k</code> MUST be set to the first 16 bytes of the hash of
-the block for which the filter is constructed. This ensures the key is
-deterministic while still varying from block to block.
+Empirical analysis also shows that was chosen as these parameters minimize the
+bandwidth utilized, considering both the expected number of blocks downloaded
+due to false positives and the size of the filters themselves.
+
+The parameter <code>k</code> MUST be set to the first 16 bytes of the hash
+(in standard little-endian representation) of the block for which the filter is
+constructed. This ensures the key is deterministic while still varying from
+block to block.
 
 Since the value <code>N</code> is required to decode a GCS, a serialized GCS
-includes it as a prefix, written as a CompactSize. Thus, the complete
-serialization of a filter is:
-* <code>N</code>, encoded as a CompactSize
+includes it as a prefix, written as a <code>CompactSize</code>. Thus, the
+complete serialization of a filter is:
+* <code>N</code>, encoded as a <code>CompactSize</code>
 * The bytes of the compressed filter itself
 
 ==== Signaling ====
@@ -323,7 +334,8 @@ though it requires implementation of the new filters.
 
 We would like to thank bfd (from the bitcoin-dev mailing list) for bringing the
 basis of this BIP to our attention, Greg Maxwell for pointing us in the
-direction of Golomb-Rice coding and fast range optimization, and Pedro
+direction of Golomb-Rice coding and fast range optimization, Pieter Wullie for
+his analysis of optimal GCS parameters, and Pedro
 Martelletto for writing the initial indexing code for <code>btcd</code>.
 
 We would also like to thank Dave Collins, JJ Jeffrey, and Eric Lombrozo for
@@ -335,7 +347,7 @@ Light client: [https://github.com/lightninglabs/neutrino]
 
 Full-node indexing: https://github.com/Roasbeef/btcd/tree/segwit-cbf
 
-Golomb-Rice Coded sets: https://github.com/Roasbeef/btcutil/tree/gcs/gcs
+Golomb-Rice Coded sets: https://github.com/btcsuite/btcutil/blob/master/gcs
 
 == Appendix A: Alternatives ==
 
@@ -375,8 +387,8 @@ easier to understand.
 === Golomb-Coded Set Multi-Match ===
 
 <pre>
-gcs_match_any(key: [16]byte, compressed_set: []byte, targets: [][]byte, P: uint, N: uint) -> bool:
-    let F = N << P
+gcs_match_any(key: [16]byte, compressed_set: []byte, targets: [][]byte, P: uint, N: uint, M: uint) -> bool:
+    let F = N * M
 
     // Map targets to the same range as the set hashes.
     let target_hashes = []
@@ -420,7 +432,7 @@ gcs_match_any(key: [16]byte, compressed_set: []byte, targets: [][]byte, P: uint,
 
 == Appendix C: Test Vectors ==
 
-TODO: To be generated.
+Test vectors for basic block filters on five testnet blocks, including the filters and filter headers, can be found [[bip-0158/testnet-19.json|here]]. The code to generate them can be found [[bip-0158/gentestvectors.go|here]].
 
 == References ==
 
@@ -428,4 +440,4 @@ TODO: To be generated.
 
 == Copyright ==
 
-This document is licensed under the  Creative Commons CC0 1.0 Universal lisence.
+This document is licensed under the  Creative Commons CC0 1.0 Universal license.