Merge pull request #185 from sipa/202001_shifted_leaf_v

Rewrite leaf versions rationale
author: Pieter Wuille <pieter.wuille@gmail.com> 2020-01-13 10:26:14 -0500
committer: GitHub <noreply@github.com> 2020-01-13 10:26:14 -0500
commit: bdef41ab58f966bd8680b4589652dd29262201bb (patch)
tree: 9b1a3645f870d858943d6e0488577b9a211b5d92
parent: 6a36adea8dfc512afe79793f2f6150395804733e (diff)
parent: f978178d6b8738425d27fb1c007cee6ac136efc4 (diff)
download: bips-bdef41ab58f966bd8680b4589652dd29262201bb.tar.xz
2 files changed, 7 insertions, 11 deletions
diff --git a/bip-taproot.mediawiki b/bip-taproot.mediawiki
index 6090a71..7d2318e 100644
--- a/bip-taproot.mediawiki
+++ b/bip-taproot.mediawiki
@@ -59,19 +59,15 @@ The following rules only apply when such an output is being spent. Any other out
 
 * Let ''q'' be the 32-byte array containing the witness program (the second push in the scriptPubKey) which represents a public key according to bip-schnorr.
 * Fail if the witness stack has 0 elements.
-* If there are at least two witness elements, and the first byte of the last element is 0x50<ref>'''Why is the first byte of the annex <code>0x50</code>?''' Like the <code>0xc0</code> constant, <code>0x50</code> is chosen as it could not be confused with a valid P2WPKH or P2WSH spending. As the control block's initial byte's lowest bit is used to indicate the public key's Y squareness, each script version needs two subsequence byte values that are both not yet used in P2WPKH or P2WSH spending. To indicate the annex, only an "unpaired" available byte is necessary like <code>0x50</code>. This choice maximizes the available options for future script versions.</ref>, this last element is called ''annex'' ''a''<ref>'''What is the purpose of the annex?''' The annex is a reserved space for future extensions, such as indicating the validation costs of computationally expensive new opcodes in a way that is recognizable without knowing the outputs being spent. Until the meaning of this field is defined by another softfork, users SHOULD NOT include <code>annex</code> in transactions, or it may lead to PERMANENT FUND LOSS.</ref> and is removed from the witness stack. The annex (or the lack of thereof) is always covered by the transaction digest and contributes to transaction weight, but is otherwise ignored during taproot validation.
+* If there are at least two witness elements, and the first byte of the last element is 0x50<ref>'''Why is the first byte of the annex <code>0x50</code>?''' The <code>0x50</code> is chosen as it could not be confused with a valid P2WPKH or P2WSH spending. As the control block's initial byte's lowest bit is used to indicate the public key's Y squareness, each leaf version needs an even byte value and the immediately following odd byte value that are both not yet used in P2WPKH or P2WSH spending. To indicate the annex, only an "unpaired" available byte is necessary like <code>0x50</code>. This choice maximizes the available options for future script versions.</ref>, this last element is called ''annex'' ''a''<ref>'''What is the purpose of the annex?''' The annex is a reserved space for future extensions, such as indicating the validation costs of computationally expensive new opcodes in a way that is recognizable without knowing the scriptPubKey of the output being spent. Until the meaning of this field is defined by another softfork, users SHOULD NOT include <code>annex</code> in transactions, or it may lead to PERMANENT FUND LOSS.</ref> and is removed from the witness stack. The annex (or the lack of thereof) is always covered by the transaction digest and contributes to transaction weight, but is otherwise ignored during taproot validation.
 * If there is exactly one element left in the witness stack, key path spending is used:
 ** The single witness stack element is interpreted as the signature and must be valid (see the next section) for the public key ''q'' and taproot transaction digest (to be defined hereinafter) as message. Fail if it is not. Otherwise pass.
 * If there are at least two witness elements left, script path spending is used:
 ** Call the second-to-last stack element ''s'', the script.
 ** The last stack element is called the control block ''c'', and must have length ''33 + 32m'', for a value of ''m'' that is an integer between 0 and 128<ref>'''Why is the Merkle path length limited to 128?''' The optimally space-efficient Merkle tree can be constructed based on the probabilities of the scripts in the leaves, using the Huffman algorithm. This algorithm will construct branches with lengths approximately equal to ''log<sub>2</sub>(1/probability)'', but to have branches longer than 128 you would need to have scripts with an execution chance below 1 in ''2<sup>128</sup>''. As that is our security bound, scripts that truly have such a low chance can probably be removed entirely.</ref>, inclusive. Fail if it does not have such a length.
 ** Let ''p = c[1:33]'' and let ''P = point(p)'' where ''point'' and ''[:]'' are defined as in bip-schnorr. Fail if this point is not on the curve.
-** Let ''l = c[0] & 0xfe'', the leaf version<ref>'''What is the purpose of the first byte of the control block?''' The first byte of the control block has three distinct functions:
-* The low bit is used to denote whether the point represented by public key ''q'' is negated before verifying the taproot tweak.<ref>'''Why is it necessary to reveal a bit to indicate if the point represented by the output public key is negated in a script path spend?''' The ''point'' function (defined in bip-schnorr) always constructs a point with a square Y coordinate, but because ''Q'' is constructed by adding the taproot tweak to the internal public key ''P'', it cannot easily be guaranteed that ''Q'' in fact has such a Y coordinate. Therefore, before verifying the taproot tweak the original point is restored by negating if necessary. We can not ignore the Y coordinate because it would prevent batch verification. Trying out multiple internal keys until there's such a ''Q'' is possible but undesirable and unnecessary since this information about the Y coordinate only consumes an unused bit.</ref>
-* By keeping the top two bits set to true, it can be guaranteed that scripts can be recognized without knowledge of the UTXO being spent, simplifying analysis. This is because such values cannot occur as first byte of the final stack element in either P2WPKH or P2WSH spends.
-* The remaining five bits are used for introducing new script versions that are not observable unless actually executed.
-</ref>.
-** Let ''k<sub>0</sub> = hash<sub>TapLeaf</sub>(l || compact_size(size of s) || s)''; also call it the ''tapleaf hash''.
+** Let ''v = c[0] & 0xfe'' and call it the ''leaf version''<ref>'''What constraints are there on the leaf version?''' First, the leaf version cannot be odd as ''c[0] & 0xfe'' will always be even, and cannot be ''0x50'' as that would result in ambiguity with the annex. In addition, in order to support some forms of static analysis that rely on being able to identify script spends without access to the output being spent, it is recommended to avoid using any leaf versions that would conflict with a valid first byte of either a valid P2WPKH pubkey or a valid P2WSH script (that is, both ''v'' and ''v | 1'' should be an undefined, invalid or disabled opcode or an opcode that is not valid as the first opcode). The values that comply to this rule are the 32 even values between ''0xc0'' and ''0xfe'' and also ''0x66'', ''0x7e'', ''0x80'', ''0x84'', ''0x96'', ''0x98'', ''0xba'', ''0xbc'', ''0xbe''. Note also that this constraint implies that leaf versions should be shared amongst different witness versions, as knowing the witness version requires access to the output being spent.</ref>.
+** Let ''k<sub>0</sub> = hash<sub>TapLeaf</sub>(v || compact_size(size of s) || s)''; also call it the ''tapleaf hash''.
 ** For ''j'' in ''[0,1,...,m-1]'':
 *** Let ''e<sub>j</sub> = c[33+32j:65+32j]''.
 *** Let ''k<sub>j+1</sub> depend on whether ''k<sub>j</sub> < e<sub>j</sub>'' (lexicographically)<ref>'''Why are child elements sorted before hashing in the Merkle tree?''' By doing so, it is not necessary to reveal the left/right directions along with the hashes in revealed Merkle branches. This is possible because we do not actually care about the position of specific scripts in the tree; only that they are actually committed to.</ref>:
@@ -79,9 +75,9 @@ The following rules only apply when such an output is being spent. Any other out
 **** If ''k<sub>j</sub> &ge; e<sub>j</sub>'': ''k<sub>j+1</sub> = hash<sub>TapBranch</sub>(e<sub>j</sub> || k<sub>j</sub>)''.
 ** Let ''t = hash<sub>TapTweak</sub>(p || k<sub>m</sub>)''.
 ** If ''t &ge; 0xFFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFE BAAEDCE6 AF48A03B BFD25E8C D0364141'' (order of secp256k1), fail.
-** Let ''Q = point(q) if (c[0] & 1) = 1 and -point(q) otherwise''. Fail if this point is not on the curve.
+** Let ''Q = point(q) if (c[0] & 1) = 1 and -point(q) otherwise''<ref>'''Why is it necessary to reveal a bit to indicate if the point represented by the output public key is negated in a script path spend?''' The ''point'' function (defined in bip-schnorr) always constructs a point with a square Y coordinate, but because ''Q'' is constructed by adding the taproot tweak to the internal public key ''P'', it cannot easily be guaranteed that ''Q'' in fact has such a Y coordinate. Therefore, before verifying the taproot tweak the original point is restored by negating if necessary. We can not ignore the Y coordinate because it would prevent batch verification. Trying out multiple internal keys until there's such a ''Q'' is possible but undesirable and unnecessary since this information about the Y coordinate only consumes an unused bit.</ref>. Fail if this point is not on the curve.
 ** If ''Q &ne; P + int(t)G'', fail.
-** Execute the script, according to the applicable script rules<ref>'''What are the applicable script rules in script path spends?''' Bip-tapscript specifies validity rules that apply if the leaf version is ''0xc0'', but future proposals can introduce rules for other leaf versions.</ref>, using the witness stack elements excluding the script ''s'', the control block ''c'', and the annex ''a'' if present, as initial stack.
+** Execute the script, according to the applicable script rules<ref>'''What are the applicable script rules in script path spends?''' Bip-tapscript specifies validity rules that apply for leaf version 0xc0, but future proposals can introduce rules for other leaf versions.</ref>, using the witness stack elements excluding the script ''s'', the control block ''c'', and the annex ''a'' if present, as initial stack.
 
 ''q'' is referred to as ''taproot output key'' and ''p'' as ''taproot internal key''.
 
@@ -237,7 +233,7 @@ def taproot_output_script(internal_pubkey, script_tree):
 
 To spend this output using script ''D'', the control block would contain the following data in this order:
 
-     <control byte with leaf version> <internal key p> <C> <E> <AB>
+     <control byte with leaf version and negation bit> <internal key p> <C> <E> <AB>
 
 The TapTweak would then be computed as described in [[#script-validation-rules]] like so:
 
diff --git a/bip-tapscript.mediawiki b/bip-tapscript.mediawiki
index 759d0d8..e9c4278 100644
--- a/bip-tapscript.mediawiki
+++ b/bip-tapscript.mediawiki
@@ -51,7 +51,7 @@ The rules below only apply when validating a transaction input for which all of
 * The transaction input is a '''segregated witness spend''' (i.e., the scriptPubKey contains a witness program as defined in [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]).
 * It is a '''taproot spend''' as defined in bip-taproot (i.e., the witness version is 1, the witness program is 32 bytes, and it is not P2SH wrapped).
 * It is a '''script path spend''' as defined in bip-taproot (i.e., after removing the optional annex from the witness stack, two or more stack elements remain).
-* The leaf version is ''0xc0'' (i.e. the first byte of the last witness element after removing the optional annex is ''0xc0'' or ''0xc1'')<ref>'''How is the ''0xc0'' constant chosen?''' Following the guidelines in bip-taproot, by choosing a value having the two top bits set, tapscript spends are identifiable even without access to the UTXO being spent.</ref>, marking it as a '''tapscript spend'''.
+* The leaf version is ''0xc0'' (i.e. the first byte of the last witness element after removing the optional annex is ''0xc0'' or ''0xc1''), marking it as a '''tapscript spend'''.
 
 Validation of such inputs must be equivalent to performing the following steps in the specified order.
 # If the input is invalid due to BIP141 or bip-taproot, fail.
author	Pieter Wuille <pieter.wuille@gmail.com>	2020-01-13 10:26:14 -0500
committer	GitHub <noreply@github.com>	2020-01-13 10:26:14 -0500
commit	bdef41ab58f966bd8680b4589652dd29262201bb (patch)
tree	9b1a3645f870d858943d6e0488577b9a211b5d92
parent	6a36adea8dfc512afe79793f2f6150395804733e (diff)
parent	f978178d6b8738425d27fb1c007cee6ac136efc4 (diff)
download	bips-bdef41ab58f966bd8680b4589652dd29262201bb.tar.xz