diff options
author | Ryan Ofsky <ryan@ofsky.org> | 2024-08-04 21:20:39 -0400 |
---|---|---|
committer | Ryan Ofsky <ryan@ofsky.org> | 2024-08-04 22:27:10 -0400 |
commit | 1a7d20509f46f0cd38302da78dba9a0c9bb24226 (patch) | |
tree | f281bc377a68b2a6ed4a7d1709a4c65776af62e8 | |
parent | 55d19945efbb58200c37fb8322ed3494bf67f9b1 (diff) | |
parent | 73e3fa10b4dd63b7767d6b6f270df66971067341 (diff) |
Merge bitcoin/bitcoin#30526: doc: Correct uint256 hex string endianness
73e3fa10b4dd63b7767d6b6f270df66971067341 doc + test: Correct uint256 hex string endianness (Hodlinator)
Pull request description:
This PR is a follow-up to #30436.
Only changes test-code and modifies/adds comments.
Byte order of hex string representation was wrongfully documented as little-endian, but are in fact closer to "big-endian" (endianness is a memory-order concept rather than a numeric concept). `[arith_]uint256` both store their data in arrays with little-endian byte order (`arith_uint256` has host byte order within each `uint32_t` element).
**uint256_tests.cpp** - Avoid using variable from the left side of the condition in the right side. Credits to @maflcko: https://github.com/bitcoin/bitcoin/pull/30436#discussion_r1688273553
**setup_common.cpp** - Skip needless ArithToUint256-conversion. Credits to @stickies-v: https://github.com/bitcoin/bitcoin/pull/30436#discussion_r1688621638
---
<details>
<summary>
## Logical reasoning for endianness
</summary>
1. Comparing an `arith_uint256` (`base_uint<256>`) to a `uint64_t` compares the beginning of the array, and verifies the remaining elements are zero.
```C++
template <unsigned int BITS>
bool base_uint<BITS>::EqualTo(uint64_t b) const
{
for (int i = WIDTH - 1; i >= 2; i--) {
if (pn[i])
return false;
}
if (pn[1] != (b >> 32))
return false;
if (pn[0] != (b & 0xfffffffful))
return false;
return true;
}
```
...that is consistent with little endian ordering of the array.
2. They have the same endianness (but `arith_*` has host-ordering of each `uint32_t` element):
```C++
arith_uint256 UintToArith256(const uint256 &a)
{
arith_uint256 b;
for(int x=0; x<b.WIDTH; ++x)
b.pn[x] = ReadLE32(a.begin() + x*4);
return b;
}
```
### String conversions
The reversal of order which happens when converting hex-strings <=> uint256 means strings are actually closer to big-endian, see the end of `base_blob<BITS>::SetHexDeprecated`:
```C++
unsigned char* p1 = m_data.data();
unsigned char* pend = p1 + WIDTH;
while (digits > 0 && p1 < pend) {
*p1 = ::HexDigit(trimmed[--digits]);
if (digits > 0) {
*p1 |= ((unsigned char)::HexDigit(trimmed[--digits]) << 4);
p1++;
}
}
```
Same reversal here:
```C++
template <unsigned int BITS>
std::string base_blob<BITS>::GetHex() const
{
uint8_t m_data_rev[WIDTH];
for (int i = 0; i < WIDTH; ++i) {
m_data_rev[i] = m_data[WIDTH - 1 - i];
}
return HexStr(m_data_rev);
}
```
It now makes sense to me that `SetHexDeprecated`, upon receiving a shorter hex string that requires zero-padding, would pad as if the missing hex chars where towards the end of the little-endian byte array, as they are the most significant bytes. "Big-endian" string representation is also consistent with the case where `SetHexDeprecated` receives too many hex digits and discards the leftmost ones, as a form of integer narrowing takes place.
### How I got it wrong in #30436
Previously I used the less than (`<`) comparison to prove endianness, but for `uint256` it uses `memcmp` and thereby gives priority to the *lower* bytes at the beginning of the array.
```C++
constexpr int Compare(const base_blob& other) const { return std::memcmp(m_data.data(), other.m_data.data(), WIDTH); }
```
`arith_uint256` is different in that it begins by comparing the bytes from the end, as it is using little endian representation, where the bytes toward the end are more significant.
```C++
template <unsigned int BITS>
int base_uint<BITS>::CompareTo(const base_uint<BITS>& b) const
{
for (int i = WIDTH - 1; i >= 0; i--) {
if (pn[i] < b.pn[i])
return -1;
if (pn[i] > b.pn[i])
return 1;
}
return 0;
}
```
(The commit documents that `base_blob::Compare()` is doing lexicographic ordering unlike the `arith_*`-variant which is doing numeric ordering).
</details>
ACKs for top commit:
paplorinc:
ACK 73e3fa10b4dd63b7767d6b6f270df66971067341
ryanofsky:
Code review ACK 73e3fa10b4dd63b7767d6b6f270df66971067341
Tree-SHA512: 121630c37ab01aa7f7097f10322ab37da3cbc0696a6bbdbf2bbd6db180dc5938c7ed91003aaa2df7cf4a4106f973f5118ba541b5e077cf3588aa641bbd528f4e
-rw-r--r-- | src/arith_uint256.h | 2 | ||||
-rw-r--r-- | src/test/arith_uint256_tests.cpp | 11 | ||||
-rw-r--r-- | src/test/uint256_tests.cpp | 41 | ||||
-rw-r--r-- | src/test/util/setup_common.cpp | 2 | ||||
-rw-r--r-- | src/uint256.h | 37 |
5 files changed, 70 insertions, 23 deletions
diff --git a/src/arith_uint256.h b/src/arith_uint256.h index 538fbccab9..38b7453034 100644 --- a/src/arith_uint256.h +++ b/src/arith_uint256.h @@ -196,6 +196,7 @@ public: return ret; } + /** Numeric ordering (unlike \ref base_blob::Compare) */ int CompareTo(const base_uint& b) const; bool EqualTo(uint64_t b) const; @@ -218,6 +219,7 @@ public: friend inline bool operator==(const base_uint& a, uint64_t b) { return a.EqualTo(b); } friend inline bool operator!=(const base_uint& a, uint64_t b) { return !a.EqualTo(b); } + /** Hex encoding of the number (with the most significant digits first). */ std::string GetHex() const; std::string ToString() const; diff --git a/src/test/arith_uint256_tests.cpp b/src/test/arith_uint256_tests.cpp index 10028c7c93..f178499299 100644 --- a/src/test/arith_uint256_tests.cpp +++ b/src/test/arith_uint256_tests.cpp @@ -3,6 +3,7 @@ // file COPYING or http://www.opensource.org/licenses/mit-license.php. #include <arith_uint256.h> +#include <test/util/setup_common.h> #include <uint256.h> #include <boost/test/unit_test.hpp> @@ -22,7 +23,8 @@ static inline arith_uint256 arith_uint256V(const std::vector<unsigned char>& vch { return UintToArith256(uint256(vch)); } -static inline arith_uint256 arith_uint256S(const std::string& str) { return UintToArith256(uint256S(str)); } +// Takes a number written in hex (with most significant digits first). +static inline arith_uint256 arith_uint256S(std::string_view str) { return UintToArith256(uint256S(str)); } const unsigned char R1Array[] = "\x9c\x52\x4a\xdb\xcf\x56\x11\x12\x2b\x29\x12\x5e\x5d\x35\xd2\xd2" @@ -104,6 +106,7 @@ BOOST_AUTO_TEST_CASE( basics ) // constructors, equality, inequality BOOST_CHECK(arith_uint256S(R1L.ToString()) == R1L); BOOST_CHECK(arith_uint256S(" 0x" + R1L.ToString() + " ") == R1L); BOOST_CHECK(arith_uint256S("") == ZeroL); + BOOST_CHECK(arith_uint256S("1") == OneL); BOOST_CHECK(R1L == arith_uint256S(R1ArrayHex)); BOOST_CHECK(arith_uint256(R1L) == R1L); BOOST_CHECK((arith_uint256(R1L^R2L)^R2L) == R1L); @@ -278,6 +281,12 @@ BOOST_AUTO_TEST_CASE( comparison ) // <= >= < > BOOST_CHECK( R1L <= TmpL ); BOOST_CHECK( (R1L == TmpL) != (R1L < TmpL)); BOOST_CHECK( (TmpL == R1L) || !( R1L >= TmpL)); BOOST_CHECK(! (TmpL < R1L)); BOOST_CHECK(! (R1L > TmpL)); } + + BOOST_CHECK_LT(ZeroL, + OneL); + // Verify hex number representation has the most significant digits first. + BOOST_CHECK_LT(arith_uint256S("0000000000000000000000000000000000000000000000000000000000000001"), + arith_uint256S("1000000000000000000000000000000000000000000000000000000000000000")); } BOOST_AUTO_TEST_CASE( plusMinus ) diff --git a/src/test/uint256_tests.cpp b/src/test/uint256_tests.cpp index 04f8682b69..841aec1e14 100644 --- a/src/test/uint256_tests.cpp +++ b/src/test/uint256_tests.cpp @@ -61,7 +61,7 @@ static std::string ArrayToString(const unsigned char A[], unsigned int width) return Stream.str(); } -// Input is treated as little-endian. +// Takes hex string in reverse byte order. inline uint160 uint160S(std::string_view str) { uint160 rv; @@ -102,6 +102,7 @@ BOOST_AUTO_TEST_CASE( basics ) // constructors, equality, inequality BOOST_CHECK_EQUAL(uint256S(" 0x"+R1L.ToString()+"-trash;%^& "), R1L); BOOST_CHECK_EQUAL(uint256S("\t \n \n \f\n\r\t\v\t 0x"+R1L.ToString()+" \t \n \n \f\n\r\t\v\t "), R1L); BOOST_CHECK_EQUAL(uint256S(""), ZeroL); + BOOST_CHECK_EQUAL(uint256S("1"), OneL); BOOST_CHECK_EQUAL(R1L, uint256S(R1ArrayHex)); BOOST_CHECK_EQUAL(uint256(R1L), R1L); BOOST_CHECK_EQUAL(uint256(ZeroL), ZeroL); @@ -155,9 +156,15 @@ BOOST_AUTO_TEST_CASE( comparison ) // <= >= < > BOOST_CHECK_LT(R1S, MaxS); BOOST_CHECK_LT(R2S, MaxS); - // Verify hex strings are little-endian - BOOST_CHECK_LT(uint256S("2000000000000000000000000000000000000000000000000000000000000001"), - uint256S("1000000000000000000000000000000000000000000000000000000000000002")); + // Non-arithmetic uint256s compare from the beginning of their inner arrays: + BOOST_CHECK_LT(R2L, R1L); + // Ensure first element comparisons give the same order as above: + BOOST_CHECK_LT(*R2L.begin(), *R1L.begin()); + // Ensure last element comparisons give a different result (swapped params): + BOOST_CHECK_LT(*(R1L.end()-1), *(R2L.end()-1)); + // Hex strings represent reverse-encoded bytes, with lexicographic ordering: + BOOST_CHECK_LT(uint256S("1000000000000000000000000000000000000000000000000000000000000000"), + uint256S("0000000000000000000000000000000000000000000000000000000000000001")); } BOOST_AUTO_TEST_CASE(methods) // GetHex SetHexDeprecated FromHex begin() end() size() GetLow64 GetSerializeSize, Serialize, Unserialize @@ -175,11 +182,11 @@ BOOST_AUTO_TEST_CASE(methods) // GetHex SetHexDeprecated FromHex begin() end() s BOOST_CHECK_EQUAL(uint256::FromHex(ZeroL.ToString()).value(), uint256()); TmpL = uint256::FromHex(R1L.ToString()).value(); - BOOST_CHECK_EQUAL_COLLECTIONS(R1L.begin(), R1L.end(), R1Array, R1Array + R1L.size()); - BOOST_CHECK_EQUAL_COLLECTIONS(TmpL.begin(), TmpL.end(), R1Array, R1Array + TmpL.size()); - BOOST_CHECK_EQUAL_COLLECTIONS(R2L.begin(), R2L.end(), R2Array, R2Array + R2L.size()); - BOOST_CHECK_EQUAL_COLLECTIONS(ZeroL.begin(), ZeroL.end(), ZeroArray, ZeroArray + ZeroL.size()); - BOOST_CHECK_EQUAL_COLLECTIONS(OneL.begin(), OneL.end(), OneArray, OneArray + OneL.size()); + BOOST_CHECK_EQUAL_COLLECTIONS(R1L.begin(), R1L.end(), R1Array, R1Array + uint256::size()); + BOOST_CHECK_EQUAL_COLLECTIONS(TmpL.begin(), TmpL.end(), R1Array, R1Array + uint256::size()); + BOOST_CHECK_EQUAL_COLLECTIONS(R2L.begin(), R2L.end(), R2Array, R2Array + uint256::size()); + BOOST_CHECK_EQUAL_COLLECTIONS(ZeroL.begin(), ZeroL.end(), ZeroArray, ZeroArray + uint256::size()); + BOOST_CHECK_EQUAL_COLLECTIONS(OneL.begin(), OneL.end(), OneArray, OneArray + uint256::size()); BOOST_CHECK_EQUAL(R1L.size(), sizeof(R1L)); BOOST_CHECK_EQUAL(sizeof(R1L), 32); BOOST_CHECK_EQUAL(R1L.size(), 32); @@ -221,11 +228,11 @@ BOOST_AUTO_TEST_CASE(methods) // GetHex SetHexDeprecated FromHex begin() end() s BOOST_CHECK_EQUAL(uint160::FromHex(ZeroS.ToString()).value(), uint160()); TmpS = uint160::FromHex(R1S.ToString()).value(); - BOOST_CHECK_EQUAL_COLLECTIONS(R1S.begin(), R1S.end(), R1Array, R1Array + R1S.size()); - BOOST_CHECK_EQUAL_COLLECTIONS(TmpS.begin(), TmpS.end(), R1Array, R1Array + TmpS.size()); - BOOST_CHECK_EQUAL_COLLECTIONS(R2S.begin(), R2S.end(), R2Array, R2Array + R2S.size()); - BOOST_CHECK_EQUAL_COLLECTIONS(ZeroS.begin(), ZeroS.end(), ZeroArray, ZeroArray + ZeroS.size()); - BOOST_CHECK_EQUAL_COLLECTIONS(OneS.begin(), OneS.end(), OneArray, OneArray + OneS.size()); + BOOST_CHECK_EQUAL_COLLECTIONS(R1S.begin(), R1S.end(), R1Array, R1Array + uint160::size()); + BOOST_CHECK_EQUAL_COLLECTIONS(TmpS.begin(), TmpS.end(), R1Array, R1Array + uint160::size()); + BOOST_CHECK_EQUAL_COLLECTIONS(R2S.begin(), R2S.end(), R2Array, R2Array + uint160::size()); + BOOST_CHECK_EQUAL_COLLECTIONS(ZeroS.begin(), ZeroS.end(), ZeroArray, ZeroArray + uint160::size()); + BOOST_CHECK_EQUAL_COLLECTIONS(OneS.begin(), OneS.end(), OneArray, OneArray + uint160::size()); BOOST_CHECK_EQUAL(R1S.size(), sizeof(R1S)); BOOST_CHECK_EQUAL(sizeof(R1S), 20); BOOST_CHECK_EQUAL(R1S.size(), 20); @@ -310,7 +317,7 @@ BOOST_AUTO_TEST_CASE(parse) { std::string s_12{"0000000000000000000000000000000000000000000000000000000000000012"}; BOOST_CHECK_EQUAL(uint256S("12\0").GetHex(), s_12); - BOOST_CHECK_EQUAL(uint256S(std::string{"12\0", 3}).GetHex(), s_12); + BOOST_CHECK_EQUAL(uint256S(std::string_view{"12\0", 3}).GetHex(), s_12); BOOST_CHECK_EQUAL(uint256S("0x12").GetHex(), s_12); BOOST_CHECK_EQUAL(uint256S(" 0x12").GetHex(), s_12); BOOST_CHECK_EQUAL(uint256S(" 12").GetHex(), s_12); @@ -318,7 +325,7 @@ BOOST_AUTO_TEST_CASE(parse) { std::string s_1{uint256::ONE.GetHex()}; BOOST_CHECK_EQUAL(uint256S("1\0").GetHex(), s_1); - BOOST_CHECK_EQUAL(uint256S(std::string{"1\0", 2}).GetHex(), s_1); + BOOST_CHECK_EQUAL(uint256S(std::string_view{"1\0", 2}).GetHex(), s_1); BOOST_CHECK_EQUAL(uint256S("0x1").GetHex(), s_1); BOOST_CHECK_EQUAL(uint256S(" 0x1").GetHex(), s_1); BOOST_CHECK_EQUAL(uint256S(" 1").GetHex(), s_1); @@ -326,7 +333,7 @@ BOOST_AUTO_TEST_CASE(parse) { std::string s_0{uint256::ZERO.GetHex()}; BOOST_CHECK_EQUAL(uint256S("\0").GetHex(), s_0); - BOOST_CHECK_EQUAL(uint256S(std::string{"\0", 1}).GetHex(), s_0); + BOOST_CHECK_EQUAL(uint256S(std::string_view{"\0", 1}).GetHex(), s_0); BOOST_CHECK_EQUAL(uint256S("0x").GetHex(), s_0); BOOST_CHECK_EQUAL(uint256S(" 0x").GetHex(), s_0); BOOST_CHECK_EQUAL(uint256S(" ").GetHex(), s_0); diff --git a/src/test/util/setup_common.cpp b/src/test/util/setup_common.cpp index abe3d6a505..3f48ea4375 100644 --- a/src/test/util/setup_common.cpp +++ b/src/test/util/setup_common.cpp @@ -81,7 +81,7 @@ static FastRandomContext g_insecure_rand_ctx_temp_path; std::ostream& operator<<(std::ostream& os, const arith_uint256& num) { - os << ArithToUint256(num).ToString(); + os << num.ToString(); return os; } diff --git a/src/uint256.h b/src/uint256.h index f8e78b820f..1d45401aa1 100644 --- a/src/uint256.h +++ b/src/uint256.h @@ -53,17 +53,46 @@ public: std::fill(m_data.begin(), m_data.end(), 0); } + /** Lexicographic ordering + * @note Does NOT match the ordering on the corresponding \ref + * base_uint::CompareTo, which starts comparing from the end. + */ constexpr int Compare(const base_blob& other) const { return std::memcmp(m_data.data(), other.m_data.data(), WIDTH); } friend constexpr bool operator==(const base_blob& a, const base_blob& b) { return a.Compare(b) == 0; } friend constexpr bool operator!=(const base_blob& a, const base_blob& b) { return a.Compare(b) != 0; } friend constexpr bool operator<(const base_blob& a, const base_blob& b) { return a.Compare(b) < 0; } - // Hex string representations are little-endian. + /** @name Hex representation + * + * The reverse-byte hex representation is a convenient way to view the blob + * as a number, because it is consistent with the way the base_uint class + * converts blobs to numbers. + * + * @note base_uint treats the blob as an array of bytes with the numerically + * least significant byte first and the most significant byte last. Because + * numbers are typically written with the most significant digit first and + * the least significant digit last, the reverse hex display of the blob + * corresponds to the same numeric value that base_uint interprets from the + * blob. + * @{*/ std::string GetHex() const; - /** Unlike FromHex this accepts any invalid input, thus it is fragile and deprecated */ + /** Unlike FromHex this accepts any invalid input, thus it is fragile and deprecated! + * + * - Hex numbers that don't specify enough bytes to fill the internal array + * will be treated as setting the beginning of it, which corresponds to + * the least significant bytes when converted to base_uint. + * + * - Hex numbers specifying too many bytes will have the numerically most + * significant bytes (the beginning of the string) narrowed away. + * + * - An odd count of hex digits will result in the high bits of the leftmost + * byte being zero. + * "0x123" => {0x23, 0x1, 0x0, ..., 0x0} + */ void SetHexDeprecated(std::string_view str); std::string ToString() const; + /**@}*/ constexpr const unsigned char* data() const { return m_data.data(); } constexpr unsigned char* data() { return m_data.data(); } @@ -93,7 +122,7 @@ public: namespace detail { /** - * Writes the hex string (treated as little-endian) into a new uintN_t object + * Writes the hex string (in reverse byte order) into a new uintN_t object * and only returns a value iff all of the checks pass: * - Input length is uintN_t::size()*2 * - All characters are hex @@ -134,7 +163,7 @@ public: static const uint256 ONE; }; -/* uint256 from std::string_view, treated as little-endian. +/* uint256 from std::string_view, containing byte-reversed hex encoding. * DEPRECATED. Unlike FromHex this accepts any invalid input, thus it is fragile and deprecated! */ inline uint256 uint256S(std::string_view str) |