diff options
Diffstat (limited to 'bip-0069.mediawiki')
-rw-r--r-- | bip-0069.mediawiki | 68 |
1 files changed, 52 insertions, 16 deletions
diff --git a/bip-0069.mediawiki b/bip-0069.mediawiki index 77c1fb9..d4a9a23 100644 --- a/bip-0069.mediawiki +++ b/bip-0069.mediawiki @@ -9,7 +9,10 @@ ==Abstract== -Currently there is no standard for bitcoin wallet clients when ordering transaction inputs and outputs. As a result, wallet clients often have a discernible blockchain fingerprint, and can leak private information about their users. By contrast, a standard for non-deterministic sorting could be difficult to audit. This document proposes deterministic lexicographical sorting, using hashes of previous transactions and output indices to sort transaction inputs, as well as values and scriptPubKeys to sort transaction outputs. +Currently there is no standard for bitcoin wallet clients when ordering transaction inputs and outputs. +As a result, wallet clients often have a discernible blockchain fingerprint, and can leak private information about their users. +By contrast, a standard for non-deterministic sorting could be difficult to audit. +This document proposes deterministic lexicographical sorting, using hashes of previous transactions and output indices to sort transaction inputs, as well as values and scriptPubKeys to sort transaction outputs. ==Copyright== @@ -17,19 +20,41 @@ This BIP is in the public domain. ==Motivation== -Currently, there is no clear standard for how wallet clients ought to order transaction inputs and outputs. Since wallet clients are left to their own devices to determine this ordering, they often leak information about their users’ finances. For example, a wallet client might naively order inputs based on when addresses were added to a wallet by the user through importing or random generation. Many wallets will place spending outputs first and change outputs second, leaking information about both the sender and receiver’s finances to passive blockchain observers. Such information should remain private not only for the benefit of consumers, but in higher order financial systems must be kept secret to prevent fraud. A researcher recently demonstrated this principle when he detected that Bitstamp leaked information when creating exchange transactions, enabling potential espionage among traders. [1] - -One way to address these privacy weaknesses is by randomly ordering inputs and outputs. [2] After all, the order of inputs and outputs does not impact the function of the transaction they belong to, making random sorting viable. Unfortunately, it can be difficult to prove that this sorting process is genuinely randomly sorted based on code or run-time analysis, especially if the software is closed source. A malicious software developer can abuse the ordering of inputs and outputs as a side channel of leaking information. For example, if an attacker can patch a victim’s HD wallet client to order inputs and outputs based on the bits of a master private key, then the attacker can eventually steal all of the victim’s funds by monitoring the blockchain. Non-deterministic methods of sorting are difficult to audit because they are not repeatable. - -The lack of standardization between wallet clients when ordering inputs and outputs can yield predictable quirks that characterize particular wallet clients or services. Such quirks create unique fingerprints that a privacy attacker can employ through simple passive blockchain observation. - -The solution is to create an algorithm for sorting transaction inputs and outputs that is deterministic. Since it is deterministic, it should also be unambiguous — that is, given a particular transaction, the proper order of inputs and outputs should be obvious. To make this standard as widely applicable as possible, it should rely on information that is downloaded by both full nodes (with or without typical efficiency techniques such as pruning) and SPV nodes. In order to ensure that it does not leak confidential data, it must rely on information that is publicly accessible through the blockchain. The use of public blockchain information also allows a transaction to be sorted even when it is a multi-party transaction, such as in the example of a CoinJoin. +Currently, there is no clear standard for how wallet clients ought to order transaction inputs and outputs. +Since wallet clients are left to their own devices to determine this ordering, they often leak information about their users’ finances. +For example, a wallet client might naively order inputs based on when addresses were added to a wallet by the user through importing or random generation. +Many wallets will place spending outputs first and change outputs second, leaking information about both the sender and receiver’s finances to passive blockchain observers. +Such information should remain private not only for the benefit of consumers, but in higher order financial systems must be kept secret to prevent fraud. +Currently, there is no clear standard for how wallet clients ought to order transaction inputs and outputs. +Since wallet clients are left to their own devices to determine this ordering, they often leak information about their users’ finances. +For example, a wallet client might naively order inputs based on when addresses were added to a wallet by the user through importing or random generation. +Many wallets will place spending outputs first and change outputs second, leaking information about both the sender and receiver’s finances to passive blockchain observers. +Such information should remain private not only for the benefit of consumers, but in higher order financial systems must be kept secret to prevent fraud. +A researcher recently demonstrated this principle when he detected that Bitstamp leaked information when creating exchange transactions, enabling potential espionage among traders. [1] + +One way to address these privacy weaknesses is by randomly ordering inputs and outputs. [2] +After all, the order of inputs and outputs does not impact the function of the transaction they belong to, making random sorting viable. +Unfortunately, it can be difficult to prove that this sorting process is genuinely randomly sorted based on code or run-time analysis, especially if the software is closed source. +A malicious software developer can abuse the ordering of inputs and outputs as a side channel of leaking information. +For example, if an attacker can patch a victim’s HD wallet client to order inputs and outputs based on the bits of a master private key, then the attacker can eventually steal all of the victim’s funds by monitoring the blockchain. +Non-deterministic methods of sorting are difficult to audit because they are not repeatable. + +The lack of standardization between wallet clients when ordering inputs and outputs can yield predictable quirks that characterize particular wallet clients or services. +Such quirks create unique fingerprints that a privacy attacker can employ through simple passive blockchain observation. + +The solution is to create an algorithm for sorting transaction inputs and outputs that is deterministic. +Since it is deterministic, it should also be unambiguous — that is, given a particular transaction, the proper order of inputs and outputs should be obvious. +To make this standard as widely applicable as possible, it should rely on information that is downloaded by both full nodes (with or without typical efficiency techniques such as pruning) and SPV nodes. +In order to ensure that it does not leak confidential data, it must rely on information that is publicly accessible through the blockchain. +The use of public blockchain information also allows a transaction to be sorted even when it is a multi-party transaction, such as in the example of a CoinJoin. ==Specification== ===Applicability=== -This BIP applies to any transaction for which the order of its inputs and outputs does not impact the transaction’s function. Currently, this refers to any transaction that employs the SIGHASH_ALL signature hash type, in which signatures commit to the exact order of inputs and outputs. Transactions that use SIGHASH_ANYONECANPAY and/or SIGHASH_NONE may include inputs and/or outputs that are not signed; however, compliant software should still emit transactions with lexicographically sorted inputs and outputs, even though they may later be modified by others. +This BIP applies to any transaction for which the order of its inputs and outputs does not impact the transaction’s function. +Currently, this refers to any transaction that employs the SIGHASH_ALL signature hash type, in which signatures commit to the exact order of inputs and outputs. +Transactions that use SIGHASH_ANYONECANPAY and/or SIGHASH_NONE may include inputs and/or outputs that are not signed; however, compliant software should still emit transactions with lexicographically sorted inputs and outputs, even though they may later be modified by others. In the event that future protocol upgrades introduce new signature hash types, compliant software should apply the lexicographical ordering principle analogously. @@ -37,7 +62,8 @@ While out of scope of this BIP, protocols that do require a specified order of i ===Lexicographical Sorting=== -Ordering of inputs and outputs will rely on the output of sorting functions. These functions can be defined as taking two inputs or two outputs as parameters and returning their appropriate ordering with respect to each other. +Ordering of inputs and outputs will rely on the output of sorting functions. +These functions can be defined as taking two inputs or two outputs as parameters and returning their appropriate ordering with respect to each other. Byte arrays must be sorted with an algorithm that produces the same output as the following comparison algorithm: @@ -62,19 +88,28 @@ N.B. These comparisons do not need to operate in constant time since they are no ===Transaction Inputs=== -104 Transaction inputs are defined by the hash of a previous transaction, the output index of of a UTXO from that previous transaction, the size of an unlocking script, the unlocking script, and a sequence number. [3] For sorting inputs, the hash of the previous transaction and the output index within that transaction are sufficient for sorting purposes; each transaction hash has an extremely high probability of being unique in the blockchain — this is enforced for coinbase transactions by BIP30 — and output indices within a transaction are unique. For the sake of efficiency, transaction hashes should be compared first before output indices, since output indices from different transactions are often equivalent, while all bytes of the transaction hash are effectively random variables. +104 Transaction inputs are defined by the hash of a previous transaction, the output index of of a UTXO from that previous transaction, the size of an unlocking script, the unlocking script, and a sequence number. [3] +For sorting inputs, the hash of the previous transaction and the output index within that transaction are sufficient for sorting purposes; each transaction hash has an extremely high probability of being unique in the blockchain — this is enforced for coinbase transactions by BIP30 — and output indices within a transaction are unique. +For the sake of efficiency, transaction hashes should be compared first before output indices, since output indices from different transactions are often equivalent, while all bytes of the transaction hash are effectively random variables. -Hashes of previous transactions are considered for the purposes of this BIP in their little-endian, byte array format in order to match the traditional, human-readable string representation of the hashes. They must be sorted in accordance with the output of the bytearr_cmp() function above: the hash with the earliest lesser byte is ordered first, and shorter hashes are ordered before longer ones as a tie-breaker. In the event of two matching transaction hashes, output indices will be compared based on their integer value, with the smaller value ordered first. A further tie is extremely improbable for the aforementioned reasons. +Hashes of previous transactions are considered for the purposes of this BIP in their little-endian, byte array format in order to match the traditional, human-readable string representation of the hashes. +They must be sorted in accordance with the output of the bytearr_cmp() function above: the hash with the earliest lesser byte is ordered first, and shorter hashes are ordered before longer ones as a tie-breaker. +In the event of two matching transaction hashes, output indices will be compared based on their integer value, with the smaller value ordered first. +A further tie is extremely improbable for the aforementioned reasons. Because the hash of previous transactions and output indices must be included in a signed transaction, wallet clients capable of signing transactions will necessarily have access to this data. -Transaction malleability will not negatively impact the correctness of this process. Even if a wallet client follows this process using unconfirmed UTXOs as inputs and an attacker changes modifies the blockchain’s record of the hash of the previous transaction, the wallet client will include the invalidated previous transaction hash in its input data, and will still correctly sort with respect to that invalidated hash. +Transaction malleability will not negatively impact the correctness of this process. +Even if a wallet client follows this process using unconfirmed UTXOs as inputs and an attacker changes modifies the blockchain’s record of the hash of the previous transaction, the wallet client will include the invalidated previous transaction hash in its input data, and will still correctly sort with respect to that invalidated hash. ===Transaction Outputs=== -A transaction output is defined by its scriptPubKey and amount. [3] For sorting purposes, we will consider a scriptPubKey in its byte array representation, and a bitcoin amount in terms of their integer number of satoshis (smallest amount ordered first). +A transaction output is defined by its scriptPubKey and amount. [3] +For sorting purposes, we will consider a scriptPubKey in its byte array representation, and a bitcoin amount in terms of their integer number of satoshis (smallest amount ordered first). -For the sake of efficiency, amounts will be considered first for sorting, since they contain fewer bytes of information (8 bytes) compared to a standard P2PKH scriptPubKey (25 bytes). [4] When the values are tied, the scriptPubKey is then considered. In the event of a tie between scriptPubKeys, sorting is irrelevant since the outputs are exactly equivalent. +For the sake of efficiency, amounts will be considered first for sorting, since they contain fewer bytes of information (8 bytes) compared to a standard P2PKH scriptPubKey (25 bytes). [4] +When the values are tied, the scriptPubKey is then considered. +In the event of a tie between scriptPubKeys, sorting is irrelevant since the outputs are exactly equivalent. ===Examples=== @@ -138,4 +173,5 @@ Outputs: ==Acknowledgements== -Danno Ferrin <danno@numisight.com>, Sergio Demian Lerner <sergiolerner@certimix.com>, Justus Ranvier <justus@openbitcoinprivacyproject.org>, and Peter Todd <pete@petertodd.org> contributed to the design and motivations for this BIP. A similar proposal was submitted to the Bitcoin-dev mailing list independently by Rusty Russell <rusty@rustcorp.com.au>
\ No newline at end of file +Danno Ferrin <danno@numisight.com>, Sergio Demian Lerner <sergiolerner@certimix.com>, Justus Ranvier <justus@openbitcoinprivacyproject.org>, and Peter Todd <pete@petertodd.org> contributed to the design and motivations for this BIP. +A similar proposal was submitted to the Bitcoin-dev mailing list independently by Rusty Russell <rusty@rustcorp.com.au> |