summaryrefslogtreecommitdiff
path: root/bip-0330.mediawiki
diff options
context:
space:
mode:
authorGleb Naumenko <naumenko.gs@gmail.com>2020-03-04 14:50:25 -0500
committerGleb Naumenko <naumenko.gs@gmail.com>2022-09-29 14:43:19 +0300
commit2f98abd9ea283049ad0d36bee2828afd7d86ff96 (patch)
treeec38cb051ba018a8d4829446bb01cbcaa7598d72 /bip-0330.mediawiki
parentcb071df902eafb7054635201a8b12e76f42774ad (diff)
Changes/clarifications to bip-330.
Diffstat (limited to 'bip-0330.mediawiki')
-rw-r--r--bip-0330.mediawiki199
1 files changed, 102 insertions, 97 deletions
diff --git a/bip-0330.mediawiki b/bip-0330.mediawiki
index 581b6ae..87a4fd2 100644
--- a/bip-0330.mediawiki
+++ b/bip-0330.mediawiki
@@ -29,19 +29,24 @@ Increasing the connectivity of the network makes the network more robust to part
[https://arxiv.org/pdf/1905.10518.pdf Erlay] is an example of a high-level transaction relay protocol which employs set reconciliation for bandwidth efficiency.
+Note that what we are going to describe here is a modified version from the protocol (it is different from what is presented in the paper).
+
Erlay uses both flooding (announcing using INV messages to all peers) and reconciliation to announce transactions.
-Flooding is expensive, so Erlay seeks to use it sparingly and in strategic locations - only well-connected publicly reachable nodes flood transactions to other publicly reachable nodes via outbound connections.
-Since every unreachable node is directly connected to several reachable nodes, this policy ensures that a transaction is quickly propagated to be within one hop from most of the nodes in the network.
+Flooding is expensive, so Erlay seeks to use it only when necessary to facilitate rapid relay over a small subset of connections.
+
+Efficient set reconciliation is meant to deliver transactions to those nodes which didn't receive a transaction via flooding, and also just make sure remaining connections are in sync (directly connected pairs of nodes are aware they have nothing to learn from each other).
+
+Efficient set reconciliation works as follows:
+1) every node keeps a reconciliation set for each peer, in which transactions are placed which would have been announced using INV messages absent this protocol
+2) once in a while every node chooses a peer from its reconciliation queue to reconcile with, resulting in both sides learning the transactions known to the other side
+3) after every reconciliation round, the corresponding reconciliation set is cleared
-All transactions not propagated through flooding are propagated through efficient set reconciliation.
-To do this, every node keeps a reconciliation set for each peer, in which transactions are placed which would have been announced using INV messages absent this protocol. Every 2 seconds every node chooses a peer from its outbound connections in a predetermined order to reconcile with, resulting in both sides learning the transactions known to the other side. After every reconciliation round, the corresponding reconciliation set is cleared.
-A more detailed description of a set reconciliation round and other implementation details can be found in the paper.
+A more detailed description of a set reconciliation round can be found below.
Erlay allows us to:
-* save 40% of the bandwidth consumed by a node, given typical network connectivity as of July 2019.
-* achieve similar latency
+* save a significant portion of the bandwidth consumed by a node
* increase network connectivity for almost no bandwidth or latency cost
-* improves privacy as a side-effect
+* keep transaction propagation latency at the same level
This document proposes a P2P-layer extension which is required to enable efficient reconciliation-based protocols (like Erlay) for transaction relay.
@@ -52,13 +57,11 @@ This document proposes a P2P-layer extension which is required to enable efficie
Several new data structures are introduced to the P2P protocol first, to aid with efficient transaction relay.
====32-bit short transaction IDs====
-
-During reconciliation, significantly abbreviated transaction IDs are used of just 32 bits in size. To prevent attackers from constructing sets of transactions that cause network-wide collisions, the short ID computation is salted on a per-link basis using 64 bits of entropy contributed by both communication partners.
-
+=
Short IDs are computed as follows:
-* Let ''salt<sub>1</sub>'' and ''salt<sub>2</sub>'' be the entropy contributed by both sides; see the "sendrecon" message further for details how they are exchanged.
+* Let ''salt<sub>1</sub>'' and ''salt<sub>2</sub>'' be the entropy contributed by both sides; see the "sendtxrcncl" message further for details how they are exchanged.
* Sort the two salts such that ''salt<sub>1</sub> &le; salt<sub>2</sub>'' (which side sent what doesn't matter).
-* Compute ''h = SHA256("Tx Relay Salting" || salt<sub>1</sub> || salt<sub>2</sub>)'', where the two salts are encoded in 64-bit little-endian byte order.
+* Compute ''h = TaggedHash("Tx Relay Salting", salt<sub>1</sub>, salt<sub>2</sub>)'', where the two salts are encoded in 64-bit little-endian byte order, and TaggedHash is specified by [https://github.com/bitcoin/bips/blob/master/bip-0340.mediawiki BIP-340].
* Let ''k<sub>0</sub>'' be the 64-bit integer obtained by interpreting the first 8 bytes of ''h'' in little-endian byte order.
* Let ''k<sub>1</sub>'' be the 64-bit integer obtained by interpreting the second 8 bytes of ''h'' in little-endian byte order.
* Let ''s = SipHash-2-4((k<sub>0</sub>,k<sub>1</sub>),wtxid)'', where ''wtxid'' is the transaction hash including witness data as defined by BIP141.
@@ -104,69 +107,70 @@ def create_sketch(shortids, capacity):
The [https://github.com/sipa/minisketch/ minisketch] library implements the construction, merging, and decoding of these sketches efficiently.
-====Truncated transaction IDs====
-
-For announcing and relaying transaction outside of reconciliation, we need an unambiguous, unsalted way to refer to transactions to deduplicate transaction requests. As we're introducing a new scheme anyway, this is a good opportunity to switch to wtxid-based requests rather than txid-based ones. While using full 256-bit wtxids is possible, this is overkill as they contribute significantly to the total bandwidth as well. Instead, we truncate the wtxid to just their first 128 bits. These are referred to as truncated IDs.
-
===Intended Protocol Flow===
Set reconciliation primarily consists of the transmission and decoding of a reconciliation set sketch upon request.
-[[File:bip-0330/recon_scheme_merged.png|framed|center|Set reconciliation protocol flow]]
+Since sketches are based on the WTXIDs, the negotiation and support of Erlay should be enabled only if both peers signal [https://github.com/bitcoin/bips/blob/master/bip-0339.mediawiki BIP-339] support.
-====Bisection====
+[[File:bip-0330/recon_scheme_merged.png|framed|center|Protocol flow]]
-If a node is unable to reconstruct the set difference from the received sketch, the node then makes an additional reconciliation request, similar to the initial one, but this request is applied to only a fraction of possible transactions (e.g., in the range 0x0–0x8). Because of the linearity of sketches, a sketch of a subset of transactions would allow the node to compute a sketch for the remainder, which saves bandwidth.
+====Sketch extension====
-[[File:bip-0330/bisection.png|framed|300px|center|Bisection]]
+If a node is unable to reconstruct the set difference from the received sketch, the node then makes a request for sketch extension. The peer would then send an extension, which is a sketch of a higher capacity (allowing to decode more differences) over the same transactions minus the sketch part which was already sent initially (to save bandwidth).
+To allow this optimization, the initiator is supposed to locally store a sketch received initially.
+This optimization is possible because extending a sketch is just concatenating new elements to an array.
===New messages===
-Several new protocol messages are added: sendrecon, reqreconcil, sketch, reqbisec, reconcildiff, invtx, gettx. This section describes their serialization, contents, and semantics.
+Several new protocol messages are added: sendtxrcncl, reqrecon, sketch, reqsketchext, reconcildiff. This section describes their serialization, contents, and semantics.
In what follows, all integers are serialized in little-endian byte order. Boolean values are encoded as a single byte that must be 0 or 1 exactly. Arrays are serialized with the CompactSize prefix that encodes their length, as is common in other P2P messages.
-====sendrecon====
-The sendrecon message announces support for the reconciliation protocol. It is expected to be only sent once, and ignored by nodes that don't support it.
+====sendtxrcncl====
+The sendtxrcncl message announces support for the reconciliation protocol. It is expected to be only sent once, and ignored by nodes that don't support it.
+
+Should be sent before "verack" and accompanied by "wtxidrelay" (in any order).
+
+If "sendtxrcncl" was sent after "verack", the sender should be disconnected.
+
+If "sendtxrcncl" was sent before "verack", but by "verack" the "wtxidrelay" message was not received,
+"sendtxrcncl" should be ignored. The connection should proceed normally, but as if reconciliation
+was not supported.
+
+Must not be sent if peer specified no support for transaction relay (fRelay=0) in "version".
+Otherwise, the sender should be disconnected.
Its payload consists of:
{|class="wikitable"
! Data type !! Name !! Description
|-
-| bool || sender || Indicates whether the sender will send "reqreconcil" message
+| bool || initiator || Indicates whether the sender will send "reqrecon" message
|-
-| bool || responder || Indicates whether the sender will respond to "reqreconcil" messages.
+| bool || responder || Indicates whether the sender will respond to "reqrecon" messages.
|-
-| uint32 || version || Sender must set this to 1 currently, otherwise receiver should ignore the message.
+| uint32 || version || Sender must set this to 1 currently, otherwise receiver should ignore the message. v1 is the lowest protocol version, everything below that is a protocol violation.
|-
| uint64 || salt || The salt used in the short transaction ID computation.
|}
-"reqreconcil" messages can only be sent if the sender has sent a "sendrecon" message with sender=true, and the receiver has sent a "sendrecon" message with responder=true.
+"reqrecon" messages can only be sent if the inititor has sent a "sendtxrcncl" message with initiator=true, and the receiver has sent a "sendtxrcncl" message with responder=true.
-====reqreconcil====
-The reqreconcil message initiates a reconciliation round.
+====reqrecon====
+The reqrecon message initiates a reconciliation round.
{|class="wikitable"
! Data type !! Name !! Description
|-
| uint16 || set_size || Size of the sender's reconciliation set, used to estimate set difference.
|-
-| uint8 || q || Coefficient used to estimate set difference. Multiplied by PRECISION=2^6 and rounded up by the sender and divided by PRECISION by the receiver.
+| uint16 || q || Coefficient used to estimate set difference. Multiplied by PRECISION=(2^15) - 1 and rounded up by the sender and divided by PRECISION by the receiver.
|}
-Upon receipt of a "reqreconcil" message, the receiver:
-* Constructs and sends a "sketch" message (see below), with a sketch of capacity computed as ''|set_size - local_set_size| + q * (set_size + local_set_size) + c'', where ''local_set_size'' represents size of the receiver's reconciliation set.
+Upon receipt of a "reqrecon" message, the receiver:
+* Constructs and sends a "sketch" message (see below), with a sketch of certain ''capacity=f(set_size, local_set_size, q)'' (the exact function is suggested below), where ''local_set_size'' represents size of the receiver's reconciliation set.
* Makes a snapshot of their current reconciliation set, and clears the set itself. The snapshot is kept until a "reconcildiff" message is received by the node.
-It is suggested to use ''c=1'' to avoid sending empty sketches and reduce the overhead caused by under-estimations.
-
-Intuitively, ''q'' represents the discrepancy in sets: the closer the sets are, the lower optimal ''q'' is.
-As suggested by Erlay, ''q'' should be derived as an optimal ''q'' value for the previous reconciliation with a given peer, once the actual set sizes and set difference are known. Alternatively, ''q=0.1'' should be used as a default value.
-For example, if in previous round ''set_size=30'' and ''local_set_size=20'', and the *actual* difference was ''4'', then a node should compute ''q'' as following:
-''q=(|30-20| - 1) / (30+20)=0.18''
-The derivation of ''q'' can be changed according to the version of the protocol.
-
-No new "reqreconcil" message can be sent until a "reconcildiff" message is sent.
+No new "reqrecon" message can be sent until a "reconcildiff" message is sent.
====sketch====
The sketch message is used to communicate a sketch required to perform set reconciliation.
@@ -177,21 +181,29 @@ The sketch message is used to communicate a sketch required to perform set recon
| byte[] || skdata || The sketch of the sender's reconciliation snapshot
|}
-Upon receipt of a "sketch" message, a node computes the set difference by combining the receiver sketch with a sketch computed locally for a corresponding reconciliation set. If this is the 2nd time for this round a "sketch" message was received, the bisection approach is used, and by combining the new sketch with the previous one, two difference sketches are obtained, one for the first half and one for the second half of the short id range. The receiving node then tries to decode this sketch (or sketches), and based on the result:
-* If decoding fails, a "reconcildiff" message is sent with the failure flag set (success=false). If this was the first "sketch" in the round, a "reqbisec" message may be sent instead.
-* If decoding succeeds, a "reconcildiff" message is sent with the truncated IDs of all locally known transactions that appear in the decode result, and the short IDs of the unrecognized ones.
+The sketch message may be received in two cases.
+
+1. Initial sketch. Upon receipt of a "sketch" message, a node computes the difference sketch by combining the received sketch with a sketch computed locally for a corresponding reconciliation set. The receiving node then tries to decode the difference sketch and based on the result:
+* If the decoding failed, the receiving node requests an extension sketch by sending a "reqsketchext" message. Alternatively, the node may terminate the reconciliation right away by sending a "reconcildiff" message is sent with the failure flag set (success=false).
+* If the decoding succeeded, a "reconcildiff" message with success=true.
+The receiver also makes snapshot of their current reconciliation set, and clears the set itself. The snapshot is kept until a "reconcildiff" message is sent by the node. It is needed to enable sketch extension.
-The receiver also makes snapshot of their current reconciliation set, and clears the set itself. The snapshot is kept until a "reconcildiff" message is sent by the node.
+2. Sketch extension. By combining the sketch extension with the initially received sketch, an extended sketch is obtained. The receiving node then computes the extended difference sketch by combining the received extended sketch with an extended sketch computed locally over a corresponding reconciliation set snapshot. The receiving node then tries to decode the extended difference sketch and based on the result:
+* If the decoding failed, the receiving node terminates the reconciliation right away by sending a "reconcildiff" message is sent with the failure flag set (success=false).
+* If the decoding succeeded, a "reconcildiff" message with success=true.
-====reqbisec====
-The reqbisec message is used to signal that set reconciliation has failed and an extra sketch is needed to find set difference.
+In either cases, a "reconcildiff" with success=false should also be accompanied with announcing all transactions from the reconciliation set (or set snapshot if failed after extension) as a fallback to flooding.
+A "reconcildiff" with success=true should contain unknown short IDs of the transactions from the decoded difference, corresponding to the transactions missing on the sender's side. Known short IDs from the difference correspond to what the receiver of the message is missing, and they should be announced via an "inv" message.
+
+====reqsketchext====
+The reqsketchext message is used by reconciliation initiator to signal that initial set reconciliation has failed and a sketch extension is needed to find set difference.
It has an empty payload.
-Upon receipt of a "reqbisec" message, a node responds to it with a "sketch" message, which contains a sketch of a subset of corresponding reconciliation set <b>snapshot</b> (stored when "reqreconcil" message for the current round was processed) (values in range ''[0..(2^31)]'').
+Upon receipt of a "reqsketchext" message, a node responds to it with a "sketch" message, which contains a sketch extension: a sketch (of the same transactions sketched initially) of higher capacity without the part sent initially.
====reconcildiff====
-The reconcildiff message is used to announce transactions which are found to be missing during set reconciliation on the sender's side.
+The reconcildiff message is used by reconciliation initiator to announce transactions which are found to be missing during set reconciliation on the sender's side.
{|class="wikitable"
! Data type !! Name !! Description
@@ -201,57 +213,46 @@ The reconcildiff message is used to announce transactions which are found to be
| uint32[] || ask_shortids || The short IDs that the sender did not have.
|}
-Upon receipt a "reconcildiff" message with ''success=1'', a node sends a "invtx" message for the transactions requested by 32-bit IDs (first vector) containing their 128-bit truncated IDs (with parent transactions occuring before their dependencies), and can request announced transactions (second vector) it does not have via a "gettx" message.
-Otherwise if ''success=0'', receiver should request bisection via ''reqbisec'' (if failure happened for the first time).
-If failure happened for the second time, receiver should announce the transactions from the reconciliation set via an "invtx" message, excluding the transactions announced from the sender.
+Upon receipt a "reconcildiff" message with ''success=1'' (reconciliation success), a node sends an "inv" message for the transactions requested by 32-bit IDs (first vector) containing their wtxids (with parent transactions occuring before their dependencies).
+If ''success=0'' (reconciliation failure), receiver should announce all transactions from the reconciliation set via an "inv" message.
+In both cases, transactions the sender of the message thinks the receiver is missing are announced via an "inv" message.
+The regular "inv" deduplication should apply.
The <b>snapshot</b> of the corresponding reconciliation set is cleared by the sender and the receiver of the message.
-The sender should also send their own "invtx" message along with the reconcildiff message to announce transactions which are missing on the receiver's side.
-
-====invtx====
-The invtx message is used to announce transactions (both along with reconcildiff message and as a response to the reconcildiff message). It is the truncated ID analogue of "inv" (which cannot be used because it has 256-bit elements).
-
-{|class="wikitable"
-! Data type !! Name !! Description
-|-
-| uint128[] || inv_truncids || The truncated IDs of transactions the sender believes the receiver does not have.
-|}
-
-Upon receipt a "invtx" message, a node requests announced transactions it does not have.
-The <b>snapshot</b> of the corresponding reconciliation set is cleared by the sender of the message.
-
-====gettx====
-The gettx message is used to request transactions by 128-bit truncated IDs. It is the truncated ID analogue of "getdata".
-
-{|class="wikitable"
-! Data type !! Name !! Description
-|-
-| uint128[] || ask_truncids || The truncated IDs of transactions the sender wants the full transaction data for.
-|}
-
-Upon receipt a "gettx" message, a node sends "tx" messages for the requested transactions.
+The sender should also send their own "inv" message along with the reconcildiff message to announce transactions which are missing on the receiver's side.
==Local state==
This BIP suggests a stateful protocol and it requires storing several variables at every node to operate properly.
+====Reconciliation salt====
+When negotiating reconciliation support, peers send each other their contribution to the reconciliation salt (see how we construct short IDs above). These salts (or just the resulting salt) should be stored on both sides of the connection.
+
====Reconciliation sets====
-Every node stores a set of 128-bit truncated IDs for every peer which supports transaction reconciliation, representing the transactions which would have been sent according to the regular flooding protocol.
+Every node stores a set of wtxids for every peer which supports transaction reconciliation, representing the transactions which would have been sent according to the regular flooding protocol.
Incoming transactions are added to sets when those transactions are received (if they satisfy the policies such as minimum fee set by a peer).
A reconciliation set is moved to the corresponding set snapshot after the transmission of the initial sketch.
====Reconciliation set snapshot====
-After the transmitting of the initial sketch (either sending or receiving of reconcildiff message), every node should store the snapshot of the current reconciliation set, and clear the set.
-This is important to make bisection more stable during the reconciliation round (bisection should be applied to the snapshot).
-The snapshot is also used to efficiently lookup the transactions requested by short ID.
+After transmitting the initial sketch (either sending or receiving of the reconcildiff message), every node should store the snapshot of the current reconciliation set, and clear the set.
+This is important to make sketch extension more stable (extension should be computed over the set snapshot). Otherwise, extension would contain transactions received after sending out the initial sketch.
The snapshot is cleared after the end of the reconciliation round (sending or receiving of the reconcildiff message).
-====q-coefficient====
-The q value should be stored to make efficient difference estimation. It is shared across peers and changed after every reconciliation.
-q-coefficient represents the discrepancy in sets: the closer the sets are, the lower optimal ''q'' is.
-In future implementations, q could vary across different peers or become static.
+====Sketch capacity estimation and q-coefficient====
+
+Earlier we suggested that upon receiving a reconciliation request, a node should estimate the sketch capacity it should send: ''capacity=f(set_size, local_set_size, q)''.
+
+We suggest the following function: ''capacity=|set_size - local_set_size| + q * min(set_size, local_set_size) + c''.
+
+Intuitively, ''q'' represents the discrepancy in sets: the closer the sets are, the lower optimal ''q'' is.
+Per the Erlay paper, ''q'' should be derived as an optimal ''q'' value for the previous reconciliation with a given peer, once the actual set sizes and set difference are known.
+For example, if in previous round ''set_size=30'' and ''local_set_size=20'', and the *actual* difference was ''12'', then a node should compute ''q'' as following:
+''q=(12 - |30-20|) / min(30, 20)=0.1''
+The derivation of ''q'' can be changed according to the version of the protocol. For example, a static value could be chosen for simplicity. However, we suggest that ''q'' remains a parameter sent in every reconciliation request to enable future compatibility with more sophisticated (non-static) choices of this parameter.
+
+As for the ''c'' parameter, it is suggested to use ''c=1'' to avoid sending empty sketches and reduce the overhead caused by under-estimations.
==Backward compatibility==
@@ -261,32 +262,36 @@ Clients which do not implement this protocol remain fully compatible after this
==Rationale==
-====Why using PinSketch for set reconciliation?====
+====Why use PinSketch for set reconciliation?====
PinSketch is more bandwidth efficient than IBLT, especially for the small differences in sets we expect to operate over.
PinSketch is as bandwidth efficient as CPISync, but PinSketch has quadratic decoding complexity, while CPISync have cubic decoding complexity. This makes PinSketch significantly faster.
-====Why using 32-bit short transaction IDs?====
+====Why use 32-bit short transaction IDs?====
To use Minisketch in practice, transaction IDs should be shortened (ideally, not more than 64 bits per element).
-Small number of bits per transaction also allows to save extra bandwidth and make operations over sketches faster.
+A small number of bits per transaction also allows saving extra bandwidth and make operations over sketches faster.
According to our estimates, 32 bits provides low collision rate in a non-adversarial model (which is enabled by using independent salts per-link).
-====Why using 128-bit short IDs?====
+====Why use sketch extensions instead of bisection?====
+
+Bisection is an alternative to sketch extensions, per which a second sketch with the same initial capacity is computed over half of the txID space.
+Due to the linearity of sketches, transmitting just this one allows a reconciliation initiator to compute the sketch of the same capacity of another half. Two sketches allow the initiator to reconstruct twice as many differences as was allowed by an initial sketch.
+
+In practice this allows the initiator to amortize the bandwidth overhead of initial reconciliation failure, similarly to extension sketches, making the overhead negligible.
+
+The main benefit of sketch extensions is a much simpler implementation. Implementing bisection is hard (see [https://github.com/naumenkogs/bitcoin/commit/b5c92a41e4cc0599504cf838d20212f1a403e573 implementation]) because, in the end, we have to operate with two sketches and handle scenarios where one sketch decoded and another sketch failed.
-To avoid problems caused by the delays in the network, our protocol requires extra round of announcing unsalted transaction IDs. [https://arxiv.org/pdf/1905.10518.pdf Erlay] protocol on top of this work also requires announcing unsalted transaction IDs for flooding.
-Both of these measures allow to deduplicate transaction announcements across the peers.
-However, using full 256-bit IDs to uniquely identify transactions seems to be an overkill.
-128 is the highest power of 2 which provides good enough collision-resistance in an adversarial model, and trivially saves a significant portion of the bandwidth related to these announcements.
+It becomes even more difficult if in the future we decide to allow more than one extension/bisection. Bisection in this case have to be recursive (and spawn 4/8/16/... sketches), while for extensions we always end up with one extended sketch.
-====Why using bisection instead of extending the sketch?====
+Sketch extensions are also more flexible: extending a sketch of capacity 10 with 4 more means just computing a sketch of capacity 14 and sending the extension, while for bisection increasing the capacity to something different than 10*2/10*4/10*8/... is sophisticated implementation-wise.
-Unlike extended sketches, bisection does not require operating over sketches of higher order.
-This allows to avoid the high computational cost caused by quadratic decoding complexity.
+The only advantage of bisection is that it doesn't require computing sketches of higher capacities (exponential cost). We believe that since
+the protocol is currently designed to operate in the conditions where sketches usually have at most the capacity of 20, this efficiency is not crucial.
==Implementation==
-TODO
+https://github.com/bitcoin/bitcoin/pull/21515
==Acknowledgments==