diff options
author | MarcoFalke <falke.marco@gmail.com> | 2022-03-10 12:49:47 +0100 |
---|---|---|
committer | MarcoFalke <falke.marco@gmail.com> | 2022-03-10 12:49:50 +0100 |
commit | 76d44e832f5b9a86c5d37b583825f6ddb6096e7e (patch) | |
tree | 5cf33bae625265aa61a2b58fb11f9f34fcaf7c3d /src | |
parent | d1a940f729a7186f055526eddfa84445000d8881 (diff) | |
parent | 2f5fd3cf9225aed439d1de767312bb340972d665 (diff) |
Merge bitcoin/bitcoin#24469: test: Correctly decode UTF-8 literal string paths
2f5fd3cf9225aed439d1de767312bb340972d665 test: Correctly decode UTF-8 literal string paths (Ryan Ofsky)
Pull request description:
Call `fs::u8path()` to convert some UTF-8 string literals to paths, instead of relying on the implicit conversion. Fake Macro pointed out in https://github.com/bitcoin/bitcoin/pull/24306#discussion_r818566106 that `fs_tests` are incorrectly decoding some literal UTF-8 paths using the current windows codepage, instead of treating them as UTF-8. This could cause test failures depending what environment windows tests are run under.
The `fs::path` class exists to avoid problems like this, but because it is lenient with `const char*` conversions, under assumption that they are ["safe as long as the literals are ASCII"](https://github.com/bitcoin/bitcoin/blob/727b0cb59259ac63c627b09b503faada1a89bfb8/src/fs.h#L39), bugs like this are still possible.
If we think this is a concern, followup options to try to prevent this bug in the future are:
0. Do nothing
1. Improve the "safe as long as the literals are ASCII" comment. Make it clear that non-ASCII strings are invalid.
2. Drop the implicit `const char*` conversion functions. This would be nice because it would simplifify the `fs::path` class a little, while making it safer. Drawback is that it would require some more verbosity from callers. For example, instead of `GetDataDirNet() / "mempool.dat"` they would have to write `GetDataDirNet() / fs::u8path("mempool.dat")`
3. Keep the implicit `const char*` conversion functions, but make them call `fs::u8path()` internally. Change the "safe as long as the literals are *ASCII*" comment to "safe as long as the literals are *UTF-8*".
I'd be happy with 0, 1, or 2. I'd be a little resistant to 3 even though it was would add more safety, because it would slightly increase complexity, and because I think it would encourage representing paths as strings, when I think there are so many footguns associated with paths as strings, that it's best to convert strings to paths at the earliest point possible, and convert paths to strings at the latest point possible.
ACKs for top commit:
laanwj:
Code review ACK 2f5fd3cf9225aed439d1de767312bb340972d665
w0xlt:
crACK 2f5fd3c
Tree-SHA512: 9c56714744592094d873b79843b526d20f31ed05eff957d698368d66025764eae8bfd5305d5f7b6cc38803f0d85fa5552003e5c6cacf1e076ea6d313bcbc960c
Diffstat (limited to 'src')
-rw-r--r-- | src/test/fs_tests.cpp | 6 |
1 files changed, 3 insertions, 3 deletions
diff --git a/src/test/fs_tests.cpp b/src/test/fs_tests.cpp index 5875f0218f..d2554483d4 100644 --- a/src/test/fs_tests.cpp +++ b/src/test/fs_tests.cpp @@ -46,8 +46,8 @@ BOOST_AUTO_TEST_CASE(fsbridge_fstream) { fs::path tmpfolder = m_args.GetDataDirBase(); // tmpfile1 should be the same as tmpfile2 - fs::path tmpfile1 = tmpfolder / "fs_tests_₿_🏃"; - fs::path tmpfile2 = tmpfolder / "fs_tests_₿_🏃"; + fs::path tmpfile1 = tmpfolder / fs::u8path("fs_tests_₿_🏃"); + fs::path tmpfile2 = tmpfolder / fs::u8path("fs_tests_₿_🏃"); { std::ofstream file{tmpfile1}; file << "bitcoin"; @@ -86,7 +86,7 @@ BOOST_AUTO_TEST_CASE(fsbridge_fstream) } { // Join an absolute path and a relative path. - fs::path p = fsbridge::AbsPathJoin(tmpfolder, "fs_tests_₿_🏃"); + fs::path p = fsbridge::AbsPathJoin(tmpfolder, fs::u8path("fs_tests_₿_🏃")); BOOST_CHECK(p.is_absolute()); BOOST_CHECK_EQUAL(tmpfile1, p); } |