diff options
author | fanquake <fanquake@gmail.com> | 2021-11-17 09:55:19 +0800 |
---|---|---|
committer | fanquake <fanquake@gmail.com> | 2021-11-17 09:55:27 +0800 |
commit | b869a784ef2b259f14545bf6bd314fb58c36514b (patch) | |
tree | 65eccc89eccf72e862868e9d6200561e1a2be595 /src | |
parent | 7be37ee850f4dea45d4979c7db25e9e42e342f54 (diff) | |
parent | 9b575f1c734c052b695ce921fb6412b22c18fdb4 (diff) |
Merge bitcoin/bitcoin#23522: Improve fs::PathToString documentation
9b575f1c734c052b695ce921fb6412b22c18fdb4 Improve fs::PathToString documentation (Russell Yanofsky)
Pull request description:
Add a developer note about avoiding `fs::PathToString` in RPCs, and improve some other `fs::PathToString` comments.
Developer note might have been useful in two recent review comments:
- https://github.com/bitcoin/bitcoin/pull/23398#discussion_r741585271
- https://github.com/bitcoin/bitcoin/pull/23155#discussion_r749824259
ACKs for top commit:
laanwj:
Documentation review ACK 9b575f1c734c052b695ce921fb6412b22c18fdb4
jamesob:
ACK https://github.com/bitcoin/bitcoin/pull/23522/commits/9b575f1c734c052b695ce921fb6412b22c18fdb4
prayank23:
ACK https://github.com/bitcoin/bitcoin/pull/23522/commits/9b575f1c734c052b695ce921fb6412b22c18fdb4
hebasto:
ACK 9b575f1c734c052b695ce921fb6412b22c18fdb4
shaavan:
ACK 9b575f1c734c052b695ce921fb6412b22c18fdb4
Tree-SHA512: b8b3ecb6208c3897241e4f24dcec64fe7cf091bc79388862cf5f4b315cb8e804939981c4bed4c81dbff99ec9f750bad99015d0f04890704ac9df63c2a6719b6d
Diffstat (limited to 'src')
-rw-r--r-- | src/dbwrapper.cpp | 4 | ||||
-rw-r--r-- | src/fs.h | 45 |
2 files changed, 28 insertions, 21 deletions
diff --git a/src/dbwrapper.cpp b/src/dbwrapper.cpp index 2fdc54464a..dbae2c45f2 100644 --- a/src/dbwrapper.cpp +++ b/src/dbwrapper.cpp @@ -136,6 +136,10 @@ CDBWrapper::CDBWrapper(const fs::path& path, size_t nCacheSize, bool fMemory, bo TryCreateDirectories(path); LogPrintf("Opening LevelDB in %s\n", fs::PathToString(path)); } + // PathToString() return value is safe to pass to leveldb open function, + // because on POSIX leveldb passes the byte string directly to ::open(), and + // on Windows it converts from UTF-8 to UTF-16 before calling ::CreateFileW + // (see env_posix.cc and env_windows.cc). leveldb::Status status = leveldb::DB::Open(options, fs::PathToString(path), &pdb); dbwrapper_private::HandleError(status); LogPrintf("Opened LevelDB successfully\n"); @@ -94,31 +94,34 @@ static inline path operator+(path p1, path p2) /** * Convert path object to byte string. On POSIX, paths natively are byte - * strings so this is trivial. On Windows, paths natively are Unicode, so an - * encoding step is necessary. + * strings, so this is trivial. On Windows, paths natively are Unicode, so an + * encoding step is necessary. The inverse of \ref PathToString is \ref + * PathFromString. The strings returned and parsed by these functions can be + * used to call POSIX APIs, and for roundtrip conversion, logging, and + * debugging. * - * The inverse of \ref PathToString is \ref PathFromString. The strings - * returned and parsed by these functions can be used to call POSIX APIs, and - * for roundtrip conversion, logging, and debugging. But they are not - * guaranteed to be valid UTF-8, and are generally meant to be used internally, - * not externally. When communicating with external programs and libraries that - * require UTF-8, fs::path::u8string() and fs::u8path() methods can be used. - * For other applications, if support for non UTF-8 paths is required, or if - * higher-level JSON or XML or URI or C-style escapes are preferred, it may be - * also be appropriate to use different path encoding functions. - * - * Implementation note: On Windows, the std::filesystem::path(string) - * constructor and std::filesystem::path::string() method are not safe to use - * here, because these methods encode the path using C++'s narrow multibyte - * encoding, which on Windows corresponds to the current "code page", which is - * unpredictable and typically not able to represent all valid paths. So - * std::filesystem::path::u8string() and std::filesystem::u8path() functions - * are used instead on Windows. On POSIX, u8string/u8path functions are not - * safe to use because paths are not always valid UTF-8, so plain string - * methods which do not transform the path there are used. + * Because \ref PathToString and \ref PathFromString functions don't specify an + * encoding, they are meant to be used internally, not externally. They are not + * appropriate to use in applications requiring UTF-8, where + * fs::path::u8string() and fs::u8path() methods should be used instead. Other + * applications could require still different encodings. For example, JSON, XML, + * or URI applications might prefer to use higher level escapes (\uXXXX or + * &XXXX; or %XX) instead of multibyte encoding. Rust, Python, Java applications + * may require encoding paths with their respective UTF-8 derivatives WTF-8, + * PEP-383, and CESU-8 (see https://en.wikipedia.org/wiki/UTF-8#Derivatives). */ static inline std::string PathToString(const path& path) { + // Implementation note: On Windows, the std::filesystem::path(string) + // constructor and std::filesystem::path::string() method are not safe to + // use here, because these methods encode the path using C++'s narrow + // multibyte encoding, which on Windows corresponds to the current "code + // page", which is unpredictable and typically not able to represent all + // valid paths. So std::filesystem::path::u8string() and + // std::filesystem::u8path() functions are used instead on Windows. On + // POSIX, u8string/u8path functions are not safe to use because paths are + // not always valid UTF-8, so plain string methods which do not transform + // the path there are used. #ifdef WIN32 return path.u8string(); #else |