diff options
author | Ryan Ofsky <ryan@ofsky.org> | 2022-02-15 09:29:53 -0500 |
---|---|---|
committer | Ryan Ofsky <ryan@ofsky.org> | 2022-02-15 09:29:53 -0500 |
commit | dc1e7ad7a5713d885f70ccc6c93e7a4c07e76559 (patch) | |
tree | be7dc000f44b76b645d43d2d8e40f40b6aae478c /doc/design | |
parent | 8fe6f5a6fbcd8083d916cb630f35f8f5980d6825 (diff) |
Add doc/design/libraries.md
Diffstat (limited to 'doc/design')
-rw-r--r-- | doc/design/assumeutxo.md | 138 | ||||
-rw-r--r-- | doc/design/libraries.md | 104 | ||||
-rw-r--r-- | doc/design/multiprocess.md | 72 |
3 files changed, 314 insertions, 0 deletions
diff --git a/doc/design/assumeutxo.md b/doc/design/assumeutxo.md new file mode 100644 index 0000000000..2726cf779b --- /dev/null +++ b/doc/design/assumeutxo.md @@ -0,0 +1,138 @@ +# assumeutxo + +Assumeutxo is a feature that allows fast bootstrapping of a validating bitcoind +instance with a very similar security model to assumevalid. + +The RPC commands `dumptxoutset` and `loadtxoutset` are used to respectively generate +and load UTXO snapshots. The utility script `./contrib/devtools/utxo_snapshot.sh` may +be of use. + +## General background + +- [assumeutxo proposal](https://github.com/jamesob/assumeutxo-docs/tree/2019-04-proposal/proposal) +- [Github issue](https://github.com/bitcoin/bitcoin/issues/15605) +- [draft PR](https://github.com/bitcoin/bitcoin/pull/15606) + +## Design notes + +- A new block index `nStatus` flag is introduced, `BLOCK_ASSUMED_VALID`, to mark block + index entries that are required to be assumed-valid by a chainstate created + from a UTXO snapshot. This flag is mostly used as a way to modify certain + CheckBlockIndex() logic to account for index entries that are pending validation by a + chainstate running asynchronously in the background. We also use this flag to control + which index entries are added to setBlockIndexCandidates during LoadBlockIndex(). + +- Indexing implementations via BaseIndex can no longer assume that indexation happens + sequentially, since background validation chainstates can submit BlockConnected + events out of order with the active chain. + +- The concept of UTXO snapshots is treated as an implementation detail that lives + behind the ChainstateManager interface. The external presentation of the changes + required to facilitate the use of UTXO snapshots is the understanding that there are + now certain regions of the chain that can be temporarily assumed to be valid (using + the nStatus flag mentioned above). In certain cases, e.g. wallet rescanning, this is + very similar to dealing with a pruned chain. + + Logic outside ChainstateManager should try not to know about snapshots, instead + preferring to work in terms of more general states like assumed-valid. + + +## Chainstate phases + +Chainstate within the system goes through a number of phases when UTXO snapshots are +used, as managed by `ChainstateManager`. At various points there can be multiple +`CChainState` objects in existence to facilitate both maintaining the network tip and +performing historical validation of the assumed-valid chain. + +It is worth noting that though there are multiple separate chainstates, those +chainstates share use of a common block index (i.e. they hold the same `BlockManager` +reference). + +The subheadings below outline the phases and the corresponding changes to chainstate +data. + +### "Normal" operation via initial block download + +`ChainstateManager` manages a single CChainState object, for which +`m_snapshot_blockhash` is null. This chainstate is (maybe obviously) +considered active. This is the "traditional" mode of operation for bitcoind. + +| | | +| ---------- | ----------- | +| number of chainstates | 1 | +| active chainstate | ibd | + +### User loads a UTXO snapshot via `loadtxoutset` RPC + +`ChainstateManager` initializes a new chainstate (see `ActivateSnapshot()`) to load the +snapshot contents into. During snapshot load and validation (see +`PopulateAndValidateSnapshot()`), the new chainstate is not considered active and the +original chainstate remains in use as active. + +| | | +| ---------- | ----------- | +| number of chainstates | 2 | +| active chainstate | ibd | + +Once the snapshot chainstate is loaded and validated, it is promoted to active +chainstate and a sync to tip begins. A new chainstate directory is created in the +datadir for the snapshot chainstate called +`chainstate_[SHA256 blockhash of snapshot base block]`. + +| | | +| ---------- | ----------- | +| number of chainstates | 2 | +| active chainstate | snapshot | + +The snapshot begins to sync to tip from its base block, technically in parallel with +the original chainstate, but it is given priority during block download and is +allocated most of the cache (see `MaybeRebalanceCaches()` and usages) as our chief +consideration is getting to network tip. + +**Failure consideration:** if shutdown happens at any point during this phase, both +chainstates will be detected during the next init and the process will resume. + +### Snapshot chainstate hits network tip + +Once the snapshot chainstate leaves IBD, caches are rebalanced +(via `MaybeRebalanceCaches()` in `ActivateBestChain()`) and more cache is given +to the background chainstate, which is responsible for doing full validation of the +assumed-valid parts of the chain. + +**Note:** at this point, ValidationInterface callbacks will be coming in from both +chainstates. Considerations here must be made for indexing, which may no longer be happening +sequentially. + +### Background chainstate hits snapshot base block + +Once the tip of the background chainstate hits the base block of the snapshot +chainstate, we stop use of the background chainstate by setting `m_stop_use` (not yet +committed - see #15606), in `CompleteSnapshotValidation()`, which is checked in +`ActivateBestChain()`). We hash the background chainstate's UTXO set contents and +ensure it matches the compiled value in `CMainParams::m_assumeutxo_data`. + +The background chainstate data lingers on disk until shutdown, when in +`ChainstateManager::Reset()`, the background chainstate is cleaned up with +`ValidatedSnapshotShutdownCleanup()`, which renames the `chainstate_[hash]` datadir as +`chainstate`. + +| | | +| ---------- | ----------- | +| number of chainstates | 2 (ibd has `m_stop_use=true`) | +| active chainstate | snapshot | + +**Failure consideration:** if bitcoind unexpectedly halts after `m_stop_use` is set on +the background chainstate but before `CompleteSnapshotValidation()` can finish, the +need to complete snapshot validation will be detected on subsequent init by +`ChainstateManager::CheckForUncleanShutdown()`. + +### Bitcoind restarts sometime after snapshot validation has completed + +When bitcoind initializes again, what began as the snapshot chainstate is now +indistinguishable from a chainstate that has been built from the traditional IBD +process, and will be initialized as such. + +| | | +| ---------- | ----------- | +| number of chainstates | 1 | +| active chainstate | ibd | diff --git a/doc/design/libraries.md b/doc/design/libraries.md new file mode 100644 index 0000000000..75f8d60ba0 --- /dev/null +++ b/doc/design/libraries.md @@ -0,0 +1,104 @@ +# Libraries + +| Name | Description | +|--------------------------|-------------| +| *libbitcoin_cli* | RPC client functionality used by *bitcoin-cli* executable | +| *libbitcoin_common* | Home for common functionality shared by different executables and libraries. Similar to *libbitcoin_util*, but higher-level (see [Dependencies](#dependencies)). | +| *libbitcoin_consensus* | Stable, backwards-compatible consensus functionality used by *libbitcoin_node* and *libbitcoin_wallet* and also exposed as a [shared library](../shared-libraries.md). | +| *libbitcoinconsensus* | Shared library build of static *libbitcoin_consensus* library | +| *libbitcoin_kernel* | Consensus engine and support library used for validation by *libbitcoin_node* and also exposed as a [shared library](../shared-libraries.md). | +| *libbitcoinqt* | GUI functionality used by *bitcoin-qt* and *bitcoin-gui* executables | +| *libbitcoin_ipc* | IPC functionality used by *bitcoin-node*, *bitcoin-wallet*, *bitcoin-gui* executables to communicate when [`--enable-multiprocess`](multiprocess.md) is used. | +| *libbitcoin_node* | P2P and RPC server functionality used by *bitcoind* and *bitcoin-qt* executables. | +| *libbitcoin_util* | Home for common functionality shared by different executables and libraries. Similar to *libbitcoin_common*, but lower-level (see [Dependencies](#dependencies)). | +| *libbitcoin_wallet* | Wallet functionality used by *bitcoind* and *bitcoin-wallet* executables. | +| *libbitcoin_wallet_tool* | Lower-level wallet functionality used by *bitcoin-wallet* executable. | +| *libbitcoin_zmq* | [ZeroMQ](../zmq.md) functionality used by *bitcoind* and *bitcoin-qt* executables. | + +## Conventions + +- Most libraries are internal libraries and have APIs which are completely unstable! There are few or no restrictions on backwards compatibility or rules about external dependencies. Exceptions are *libbitcoin_consensus* and *libbitcoin_kernel* which have external interfaces documented at [../shared-libraries.md](../shared-libraries.md). + +- Generally each library should have a corresponding source directory and namespace. Source code organization is a work in progress, so it is true that some namespaces are applied inconsistently, and if you look at [`libbitcoin_*_SOURCES`](../../src/Makefile.am) lists you can see that many libraries pull in files from outside their source directory. But when working with libraries, it is good to follow a consistent pattern like: + + - *libbitcoin_node* code lives in `src/node/` in the `node::` namespace + - *libbitcoin_wallet* code lives in `src/wallet/` in the `wallet::` namespace + - *libbitcoin_ipc* code lives in `src/ipc/` in the `ipc::` namespace + - *libbitcoin_util* code lives in `src/util/` in the `util::` namespace + - *libbitcoin_consensus* code lives in `src/consensus/` in the `Consensus::` namespace + +## Dependencies + +- Libraries should minimize what other libraries they depend on, and only reference symbols following the arrows shown in the dependency graph below: + +<table><tr><td> + +```mermaid + +%%{ init : { "flowchart" : { "curve" : "linear" }}}%% + +graph TD; + +bitcoin-cli[bitcoin-cli]-->libbitcoin_cli; + +bitcoind[bitcoind]-->libbitcoin_node; +bitcoind[bitcoind]-->libbitcoin_wallet; + +bitcoin-qt[bitcoin-qt]-->libbitcoin_node; +bitcoin-qt[bitcoin-qt]-->libbitcoinqt; +bitcoin-qt[bitcoin-qt]-->libbitcoin_wallet; + +bitcoin-wallet[bitcoin-wallet]-->libbitcoin_wallet; +bitcoin-wallet[bitcoin-wallet]-->libbitcoin_wallet_tool; + +libbitcoin_cli-->libbitcoin_common; +libbitcoin_cli-->libbitcoin_util; + +libbitcoin_common-->libbitcoin_util; +libbitcoin_common-->libbitcoin_consensus; + +libbitcoin_kernel-->libbitcoin_consensus; +libbitcoin_kernel-->libbitcoin_util; + +libbitcoin_node-->libbitcoin_common; +libbitcoin_node-->libbitcoin_consensus; +libbitcoin_node-->libbitcoin_kernel; +libbitcoin_node-->libbitcoin_util; + +libbitcoinqt-->libbitcoin_common; +libbitcoinqt-->libbitcoin_util; + +libbitcoin_wallet-->libbitcoin_common; +libbitcoin_wallet-->libbitcoin_util; + +libbitcoin_wallet_tool-->libbitcoin_util; +libbitcoin_wallet_tool-->libbitcoin_wallet; + +classDef bold stroke-width:2px, font-weight:bold, font-size: smaller; +class bitcoin-qt,bitcoind,bitcoin-cli,bitcoin-wallet bold +``` +</td></tr><tr><td> + +**Dependency graph**. Arrows show linker symbol dependencies. *Consensus* lib depends on nothing. *Util* lib is depended on by everything. *Kernel* lib depends only on consensus and util. + +</td></tr></table> + +- The graph shows what _linker symbols_ (functions and variables) from each library other libraries can call and reference directly, but it is not a call graph. For example, there is no arrow connecting *libbitcoin_wallet* and *libbitcoin_node* libraries, because these libraries are intended to be modular and not depend on each other's internal implementation details. But wallet code still is still able to call node code indirectly through the `interfaces::Chain` abstract class in [`interfaces/chain.h`](../../src/interfaces/chain.h) and node code calls wallet code through the `interfaces::ChainClient` and `interfaces::Chain::Notifications` abstract classes in the same file. In general, defining abstract classes in [`src/interfaces/`](../../src/interfaces/) can be a convenient way of avoiding unwanted direct dependencies or circular dependencies between libraries. + +- *libbitcoin_consensus* should be a standalone dependency that any library can depend on, and it should not depend on any other libraries itself. + +- *libbitcoin_util* should also be a standalone dependency that any library can depend on, and it should not depend on other internal libraries. + +- *libbitcoin_common* should serve a similar function as *libbitcoin_util* and be a place for miscellaneous code used by various daemon, GUI, and CLI applications and libraries to live. It should not depend on anything other than *libbitcoin_util* and *libbitcoin_consensus*. The boundary between _util_ and _common_ is a little fuzzy but historically _util_ has been used for more generic, lower-level things like parsing hex, and _common_ has been used for bitcoin-specific, higher-level things like parsing base58. The difference between util and common is mostly important because *libbitcoin_kernel* is not supposed to depend on *libbitcoin_common*, only *libbitcoin_util*. In general, if it is ever unclear whether it is better to add code to *util* or *common*, it is probably better to add it to *common* unless it is very generically useful or useful particularly to include in the kernel. + + +- *libbitcoin_kernel* should only depend on *libbitcoin_util* and *libbitcoin_consensus*. + +- The only thing that should depend on *libbitcoin_kernel* internally should be *libbitcoin_node*. GUI and wallet libraries *libbitcoinqt* and *libbitcoin_wallet* in particular should not depend on *libbitcoin_kernel* and the unneeded functionality it would pull in, like block validation. To the extent that GUI and wallet code need scripting and signing functionality, they should be get able it from *libbitcoin_consensus*, *libbitcoin_common*, and *libbitcoin_util*, instead of *libbitcoin_kernel*. + +- GUI, node, and wallet code internal implementations should all be independent of each other, and the *libbitcoinqt*, *libbitcoin_node*, *libbitcoin_wallet* libraries should never reference each other's symbols. They should only call each other through [`src/interfaces/`](`../../src/interfaces/`) abstract interfaces. + +## Work in progress + +- Validation code is moving from *libbitcoin_node* to *libbitcoin_kernel* as part of [The libbitcoinkernel Project #24303](https://github.com/bitcoin/bitcoin/issues/24303) +- Source code organization is discussed in general in [Library source code organization #15732](https://github.com/bitcoin/bitcoin/issues/15732) diff --git a/doc/design/multiprocess.md b/doc/design/multiprocess.md new file mode 100644 index 0000000000..e3f389a6d3 --- /dev/null +++ b/doc/design/multiprocess.md @@ -0,0 +1,72 @@ +# Multiprocess Bitcoin + +On unix systems, the `--enable-multiprocess` build option can be passed to `./configure` to build new `bitcoin-node`, `bitcoin-wallet`, and `bitcoin-gui` executables alongside existing `bitcoind` and `bitcoin-qt` executables. + +`bitcoin-node` is a drop-in replacement for `bitcoind`, and `bitcoin-gui` is a drop-in replacement for `bitcoin-qt`, and there are no differences in use or external behavior between the new and old executables. But internally (after [#10102](https://github.com/bitcoin/bitcoin/pull/10102)), `bitcoin-gui` will spawn a `bitcoin-node` process to run P2P and RPC code, communicating with it across a socket pair, and `bitcoin-node` will spawn `bitcoin-wallet` to run wallet code, also communicating over a socket pair. This will let node, wallet, and GUI code run in separate address spaces for better isolation, and allow future improvements like being able to start and stop components independently on different machines and environments. + +## Next steps + +Specific next steps after [#10102](https://github.com/bitcoin/bitcoin/pull/10102) will be: + +- [ ] Adding `-ipcbind` and `-ipcconnect` options to `bitcoin-node`, `bitcoin-wallet`, and `bitcoin-gui` executables so they can listen and connect to TCP ports and unix socket paths. This will allow separate processes to be started and stopped any time and connect to each other. +- [ ] Adding `-server` and `-rpcbind` options to the `bitcoin-wallet` executable so wallet processes can handle RPC requests directly without going through the node. +- [ ] Supporting windows, not just unix systems. The existing socket code is already cross-platform, so the only windows-specific code that needs to be written is code spawning a process and passing a socket descriptor. This can be implemented with `CreateProcess` and `WSADuplicateSocket`. Example: https://memset.wordpress.com/2010/10/13/win32-api-passing-socket-with-ipc-method/. +- [ ] Adding sandbox features, restricting subprocess access to resources and data. See [https://eklitzke.org/multiprocess-bitcoin](https://eklitzke.org/multiprocess-bitcoin). + +## Debugging + +The `-debug=ipc` command line option can be used to see requests and responses between processes. + +## Installation + +The multiprocess feature requires [Cap'n Proto](https://capnproto.org/) and [libmultiprocess](https://github.com/chaincodelabs/libmultiprocess) as dependencies. A simple way to get starting using it without installing these dependencies manually is to use the [depends system](../depends) with the `MULTIPROCESS=1` [dependency option](../depends#dependency-options) passed to make: + +``` +cd <BITCOIN_SOURCE_DIRECTORY> +make -C depends NO_QT=1 MULTIPROCESS=1 +CONFIG_SITE=$PWD/depends/x86_64-pc-linux-gnu/share/config.site ./configure +make +src/bitcoin-node -regtest -printtoconsole -debug=ipc +BITCOIND=bitcoin-node test/functional/test_runner.py +``` + +The configure script will pick up settings and library locations from the depends directory, so there is no need to pass `--enable-multiprocess` as a separate flag when using the depends system (it's controlled by the `MULTIPROCESS=1` option). + +Alternately, you can install [Cap'n Proto](https://capnproto.org/) and [libmultiprocess](https://github.com/chaincodelabs/libmultiprocess) packages on your system, and just run `./configure --enable-multiprocess` without using the depends system. The configure script will be able to locate the installed packages via [pkg-config](https://www.freedesktop.org/wiki/Software/pkg-config/). See [Installation](https://github.com/chaincodelabs/libmultiprocess#installation) section of the libmultiprocess readme for install steps. See [build-unix.md](build-unix.md) and [build-osx.md](build-osx.md) for information about installing dependencies in general. + +## IPC implementation details + +Cross process Node, Wallet, and Chain interfaces are defined in +[`src/interfaces/`](../src/interfaces/). These are C++ classes which follow +[conventions](developer-notes.md#internal-interface-guidelines), like passing +serializable arguments so they can be called from different processes, and +making methods pure virtual so they can have proxy implementations that forward +calls between processes. + +When Wallet, Node, and Chain code is running in the same process, calling any +interface method invokes the implementation directly. When code is running in +different processes, calling an interface method invokes a proxy interface +implementation that communicates with a remote process and invokes the real +implementation in the remote process. The +[libmultiprocess](https://github.com/chaincodelabs/libmultiprocess) code +generation tool internally generates proxy client classes and proxy server +classes for this purpose that are thin wrappers around Cap'n Proto +[client](https://capnproto.org/cxxrpc.html#clients) and +[server](https://capnproto.org/cxxrpc.html#servers) classes, which handle the +actual serialization and socket communication. + +As much as possible, calls between processes are meant to work the same as +calls within a single process without adding limitations or requiring extra +implementation effort. Processes communicate with each other by calling regular +[C++ interface methods](../src/interfaces/README.md). Method arguments and +return values are automatically serialized and sent between processes. Object +references and `std::function` arguments are automatically tracked and mapped +to allow invoked code to call back into invoking code at any time, and there is +a 1:1 threading model where any thread invoking a method in another process has +a corresponding thread in the invoked process responsible for executing all +method calls from the source thread, without blocking I/O or holding up another +call, and using the same thread local variables, locks, and callbacks between +calls. The forwarding, tracking, and threading is implemented inside the +[libmultiprocess](https://github.com/chaincodelabs/libmultiprocess) library +which has the design goal of making calls between processes look like calls in +the same process to the extent possible. |