diff options
author | Ryan Ofsky <ryan@ofsky.org> | 2024-08-04 21:20:39 -0400 |
---|---|---|
committer | Ryan Ofsky <ryan@ofsky.org> | 2024-08-04 22:27:10 -0400 |
commit | 1a7d20509f46f0cd38302da78dba9a0c9bb24226 (patch) | |
tree | f281bc377a68b2a6ed4a7d1709a4c65776af62e8 /src/test/util | |
parent | 55d19945efbb58200c37fb8322ed3494bf67f9b1 (diff) | |
parent | 73e3fa10b4dd63b7767d6b6f270df66971067341 (diff) |
Merge bitcoin/bitcoin#30526: doc: Correct uint256 hex string endianness
73e3fa10b4dd63b7767d6b6f270df66971067341 doc + test: Correct uint256 hex string endianness (Hodlinator)
Pull request description:
This PR is a follow-up to #30436.
Only changes test-code and modifies/adds comments.
Byte order of hex string representation was wrongfully documented as little-endian, but are in fact closer to "big-endian" (endianness is a memory-order concept rather than a numeric concept). `[arith_]uint256` both store their data in arrays with little-endian byte order (`arith_uint256` has host byte order within each `uint32_t` element).
**uint256_tests.cpp** - Avoid using variable from the left side of the condition in the right side. Credits to @maflcko: https://github.com/bitcoin/bitcoin/pull/30436#discussion_r1688273553
**setup_common.cpp** - Skip needless ArithToUint256-conversion. Credits to @stickies-v: https://github.com/bitcoin/bitcoin/pull/30436#discussion_r1688621638
---
<details>
<summary>
## Logical reasoning for endianness
</summary>
1. Comparing an `arith_uint256` (`base_uint<256>`) to a `uint64_t` compares the beginning of the array, and verifies the remaining elements are zero.
```C++
template <unsigned int BITS>
bool base_uint<BITS>::EqualTo(uint64_t b) const
{
for (int i = WIDTH - 1; i >= 2; i--) {
if (pn[i])
return false;
}
if (pn[1] != (b >> 32))
return false;
if (pn[0] != (b & 0xfffffffful))
return false;
return true;
}
```
...that is consistent with little endian ordering of the array.
2. They have the same endianness (but `arith_*` has host-ordering of each `uint32_t` element):
```C++
arith_uint256 UintToArith256(const uint256 &a)
{
arith_uint256 b;
for(int x=0; x<b.WIDTH; ++x)
b.pn[x] = ReadLE32(a.begin() + x*4);
return b;
}
```
### String conversions
The reversal of order which happens when converting hex-strings <=> uint256 means strings are actually closer to big-endian, see the end of `base_blob<BITS>::SetHexDeprecated`:
```C++
unsigned char* p1 = m_data.data();
unsigned char* pend = p1 + WIDTH;
while (digits > 0 && p1 < pend) {
*p1 = ::HexDigit(trimmed[--digits]);
if (digits > 0) {
*p1 |= ((unsigned char)::HexDigit(trimmed[--digits]) << 4);
p1++;
}
}
```
Same reversal here:
```C++
template <unsigned int BITS>
std::string base_blob<BITS>::GetHex() const
{
uint8_t m_data_rev[WIDTH];
for (int i = 0; i < WIDTH; ++i) {
m_data_rev[i] = m_data[WIDTH - 1 - i];
}
return HexStr(m_data_rev);
}
```
It now makes sense to me that `SetHexDeprecated`, upon receiving a shorter hex string that requires zero-padding, would pad as if the missing hex chars where towards the end of the little-endian byte array, as they are the most significant bytes. "Big-endian" string representation is also consistent with the case where `SetHexDeprecated` receives too many hex digits and discards the leftmost ones, as a form of integer narrowing takes place.
### How I got it wrong in #30436
Previously I used the less than (`<`) comparison to prove endianness, but for `uint256` it uses `memcmp` and thereby gives priority to the *lower* bytes at the beginning of the array.
```C++
constexpr int Compare(const base_blob& other) const { return std::memcmp(m_data.data(), other.m_data.data(), WIDTH); }
```
`arith_uint256` is different in that it begins by comparing the bytes from the end, as it is using little endian representation, where the bytes toward the end are more significant.
```C++
template <unsigned int BITS>
int base_uint<BITS>::CompareTo(const base_uint<BITS>& b) const
{
for (int i = WIDTH - 1; i >= 0; i--) {
if (pn[i] < b.pn[i])
return -1;
if (pn[i] > b.pn[i])
return 1;
}
return 0;
}
```
(The commit documents that `base_blob::Compare()` is doing lexicographic ordering unlike the `arith_*`-variant which is doing numeric ordering).
</details>
ACKs for top commit:
paplorinc:
ACK 73e3fa10b4dd63b7767d6b6f270df66971067341
ryanofsky:
Code review ACK 73e3fa10b4dd63b7767d6b6f270df66971067341
Tree-SHA512: 121630c37ab01aa7f7097f10322ab37da3cbc0696a6bbdbf2bbd6db180dc5938c7ed91003aaa2df7cf4a4106f973f5118ba541b5e077cf3588aa641bbd528f4e
Diffstat (limited to 'src/test/util')
-rw-r--r-- | src/test/util/setup_common.cpp | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/src/test/util/setup_common.cpp b/src/test/util/setup_common.cpp index abe3d6a505..3f48ea4375 100644 --- a/src/test/util/setup_common.cpp +++ b/src/test/util/setup_common.cpp @@ -81,7 +81,7 @@ static FastRandomContext g_insecure_rand_ctx_temp_path; std::ostream& operator<<(std::ostream& os, const arith_uint256& num) { - os << ArithToUint256(num).ToString(); + os << num.ToString(); return os; } |