aboutsummaryrefslogtreecommitdiff
path: root/src/node/blockstorage.h
AgeCommit message (Collapse)Author
2024-07-26Return XOR AutoFile from BlockManager::Open*File()MarcoFalke
This is a refactor, because the XOR key is empty.
2024-07-24refactor: Add FlatFileSeq member variables in BlockManagerTheCharlatan
Instead of constructing a new class every time a file operation is done, construct them once for each of the undo and block file when a new BlockManager is created. In future, this might make it easier to introduce an abstract block store.
2024-07-15Merge bitcoin/bitcoin#30428: log: LogError with FlatFilePos in UndoReadFromDiskRyan Ofsky
fa14e1d9d5c5dc44396a01583ae94480b7bc29ee log: Fix __func__ in LogError in blockstorage module (MarcoFalke) fad59a2f0f37f5b7f6076fd91be43448e35f4b7e log: LogError with FlatFilePos in UndoReadFromDisk (MarcoFalke) aaaa3323f37526862ebf2a2a4bf522c661e6976e refactor: Mark IsBlockPruned const (MarcoFalke) Pull request description: These errors should never happen in normal operation. If they do, knowing the `FlatFilePos` may be useful to determine if data corruption happened. Also, handle the error `pos.IsNull()` as part of `OpenUndoFile`, because it may as well have happened due to data corruption. This mirrors the `LogError` behavior from `ReadBlockFromDisk`. Also, two other fixup commits in this module. ACKs for top commit: kevkevinpal: ACK [fa14e1d](https://github.com/bitcoin/bitcoin/pull/30428/commits/fa14e1d9d5c5dc44396a01583ae94480b7bc29ee) tdb3: cr and light test ACK fa14e1d9d5c5dc44396a01583ae94480b7bc29ee ryanofsky: Code review ACK fa14e1d9d5c5dc44396a01583ae94480b7bc29ee. This should make logging clearer and more consistent Tree-SHA512: abb492a919b4796698d1de0a7874c8eae355422b992aa80dcd6b59c2de1ee0d2949f62b3cf649cd62892976fee640358f7522867ed9d48a595d6f8f4e619df50
2024-07-11refactor: Mark IsBlockPruned constMarcoFalke
Member fields are used read-only in this method.
2024-07-10Merge bitcoin/bitcoin#29668: prune, rpc: Check undo data when finding ↵Ava Chow
pruneheight 8789dc8f315a9d9ad7142d831bc9412f780248e7 doc: Add note to getblockfrompeer on missing undo data (Fabian Jahr) 4a1975008b602aeacdad9a74d1837a7455148074 rpc: Make pruneheight also reflect undo data presence (Fabian Jahr) 96b4facc912927305b06a233cb8b36e7e5964c08 refactor, blockstorage: Generalize GetFirstStoredBlock (Fabian Jahr) Pull request description: The function `GetFirstStoredBlock()` helps us find the first block for which we have data. So far this function only looked for a block with `BLOCK_HAVE_DATA`. However, this doesn't mean that we also have the undo data of that block, and undo data might be required for what a user would like to do with those blocks. One example of how this might happen is if some blocks were fetched using the `getblockfrompeer` RPC. Blocks fetched from a peer will have data but no undo data. The first commit here allows `GetFirstStoredBlock()` to check for undo data as well by passing a parameter. This alone is useful for #29553 and I would use it there. In the second commit I am applying the undo check to the RPCs that report `pruneheight` to the user. I find this much more intuitive because I think the user expects to be able to do all operations on blocks up until the `pruneheight` but that is not the case if undo data is missing. I personally ran into this once before and now again when testing for assumeutxo when I had used `getblockfrompeer`. The following commit adds test coverage for this change of behavior. The last commit adds a note in the docs of `getblockfrompeer` that undo data will not be available. ACKs for top commit: achow101: ACK 8789dc8f315a9d9ad7142d831bc9412f780248e7 furszy: Code review ACK 8789dc8f315a9d9ad7142d831bc9412f780248e7. stickies-v: ACK 8789dc8f315a9d9ad7142d831bc9412f780248e7 Tree-SHA512: 90ae8bdd07a496ade579aa25240609c61c9ed173ad38d30533f6c631fe674e5a41727478ade69ca4b71a571ad94c9da4b33ebba6b5d8821109313c2de3bdfb3d
2024-06-21refactor, blockstorage: Generalize GetFirstStoredBlockFabian Jahr
GetFirstStoredBlock is generalized to check for any data status with a status mask that needs to be passed as a parameter. To reflect this the function is also renamed to GetFirstBlock. Co-authored-by: stickies-v <stickies-v@protonmail.com>
2024-06-07blockman: Replace m_reindexing with m_blockfiles_indexedRyan Ofsky
This is a just a mechanical change, renaming and inverting the meaning of the indexing variable. "m_blockfiles_indexed" is a more straightforward name for this variable because this variable just indicates whether or not <datadir>/blocks/blk?????.dat files have been indexed in the <datadir>/blocks/index LevelDB database. The name "m_reindexing" was more confusing, it could be true even if -reindex was not specified, and false when it was specified. Also, the previous name unnecessarily required thinking about the whole reindexing process just to understand simple checks in validation code about whether blocks were indexed. The motivation for this change is to follow up on previous commits, moving away from having multiple variables called "reindex" internally, and instead naming variables individually after what they do and represent.
2024-06-07kernel: Add less confusing reindex optionsRyan Ofsky
Drop confusing kernel options: BlockManagerOpts::reindex ChainstateLoadOptions::reindex ChainstateLoadOptions::reindex_chainstate Replacing them with more straightforward options: ChainstateLoadOptions::wipe_block_tree_db ChainstateLoadOptions::wipe_chainstate_db Having two options called "reindex" which did slightly different things was needlessly confusing (one option wiped the block tree database, and the other caused block files to be rescanned). Also the previous set of options did not allow rebuilding the block database without also rebuilding the chainstate database, when it should be possible to do those independently.
2024-05-17Merge bitcoin/bitcoin#29817: kernel: De-globalize fReindexAva Chow
b47bd959207e82555f07e028cc2246943d32d4c3 kernel: De-globalize fReindex (TheCharlatan) Pull request description: fReindex is one of the last remaining globals exposed by the kernel library, so move it into the blockstorage class to reduce the amount of global mutable state and make the kernel library a bit less awkward to use. --- This pull request is part of the [libbitcoinkernel project](https://github.com/bitcoin/bitcoin/issues/27587). ACKs for top commit: achow101: ACK b47bd959207e82555f07e028cc2246943d32d4c3 ryanofsky: Code review ACK b47bd959207e82555f07e028cc2246943d32d4c3. I rereviewed the whole PR, but the only change since last review was reverting the bugfix https://github.com/bitcoin/bitcoin/pull/29817#discussion_r1578327024 and make the change a pure refactoring. mzumsande: Code Review ACK b47bd959207e82555f07e028cc2246943d32d4c3 stickies-v: ACK b47bd959207e82555f07e028cc2246943d32d4c3 Tree-SHA512: f7399d01f93bc0c0c7428fe95d19b9d29b4ed00a4f1deabca78fb0c4fecb434ec971e890feecb105938b5247c926850b1b7b4a4a9caa333a061e40777d0c8463
2024-05-16kernel: De-globalize fReindexTheCharlatan
fReindex is one of the last remaining globals exposed by the kernel library, so move it into the blockstorage class to reduce the amount of global mutable state and make the kernel library a bit less awkward to use.
2024-05-14blockstorage: Rename FindBlockPos and have it return a FlatFilePosMartin Zumsande
The new name reflects that it is no longer called with existing blocks for which the position is already known. Returning a FlatFilePos directly simplifies the interface.
2024-05-14validation, blockstorage: Separate code paths for reindex and saving new blocksMartin Zumsande
By calling SaveBlockToDisk only when we actually want to save a new block to disk. In the reindex case, we now call UpdateBlockInfo directly from validation. This commit doesn't change behavior.
2024-05-14blockstorage: split up FindBlockPos functionMartin Zumsande
FindBlockPos does different things depending on whether the block is known or not, as shown by the fact that much of the existing code is conditional on fKnown set or not. If the block position is known (during reindex) the function only updates the block info statistics. It doesn't actually find a block position in this case. This commit removes fKnown and splits up these two code paths by introducing a separate function for the reindex case when the block position is known. It doesn't change behavior.
2024-05-14doc: Improve doc for functions involved in saving blocks to diskMartin Zumsande
In particular, document the flat file positions expected and returned by functions better. Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
2023-12-07refactor: Use reference instead of pointer in IsBlockPrunedMarcoFalke
This makes it harder to pass nullptr and cause issues such as https://github.com/bitcoin/bitcoin/commit/dde7ac5c704688c8a9af29bd07e5ae8114824ce7
2023-11-18blockstorage: switch from CAutoFile to AutoFileAnthony Towns
Also bump includes per suggestions from iwyu.
2023-10-20refactor: Remove CBlockFileInfo::SetNullMarcoFalke
2023-09-30blockstorage: segment normal/assumedvalid blockfilesJames O'Beirne
When using an assumedvalid (snapshot) chainstate along with a background chainstate, we are syncing two very different regions of the chain simultaneously. If we use the same blockfile space for both of these syncs, wildly different height blocks will be stored alongside one another, making pruning ineffective. This change implements a separate blockfile cursor for the assumedvalid chainstate when one is in use.
2023-09-30validation: populate nChainTx value for assumedvalid chainstatesJames O'Beirne
Use the expected AssumeutxoData in order to bootstrap nChainTx values for assumedvalid blockindex entries in the snapshot chainstate. This is necessary because nChainTx is normally built up from nTx values, which are populated using blockdata which the snapshot chainstate does not yet have.
2023-09-30validation: pruning for multiple chainstatesJames O'Beirne
Introduces ChainstateManager::GetPruneRange(). The prune budget is split evenly between the number of chainstates, however the prune budget may be exceeded if the resulting shares are beneath `MIN_DISK_SPACE_FOR_BLOCK_FILES`.
2023-09-29Merge bitcoin/bitcoin#27866: blockstorage: Return on fatal flush errorsRyan Ofsky
d8041d4e042957660827313951b18c8dd9a99a16 blockstorage: Return on fatal undo file flush error (TheCharlatan) f0207e00303a1030eca795ede231e3c0d94df061 blockstorage: Return on fatal block file flush error (TheCharlatan) 5671c15f4520c6dc20e0805fd0b06157ff94bcd7 blockstorage: Mark FindBlockPos as nodiscard (TheCharlatan) Pull request description: The goal of this PR is to establish that fatal blockstorage flush errors should be treated as errors at their call site. Prior to this patch `FlushBlockFile` may have failed without returning in `Chainstate::FlushStateToDisk`, leading to a potential write from `WriteBlockIndexDB` that may refer to a block that is not fully flushed to disk yet. By returning if either `FlushUndoFile` or `FlushBlockFile` fail, we ensure that no further write operations take place that may lead to an inconsistent database when crashing. Add `[[nodiscard]]` annotations to them such that they are not ignored in future. Functions that call either `FlushUndoFile` or `FlushBlockFile`, need to handle these extra abort cases properly. Since `Chainstate::FlushStateToDisk` already produces an abort error in case of `WriteBlockIndexDB` failing, no extra logic for functions calling `Chainstate::FlushStateToDisk` is required. Besides `Chainstate::FlushStateToDisk`, `FlushBlockFile` is also called by `FindBlockPos`, while `FlushUndoFile` is only called by `FlushBlockFile` and `WriteUndoDataForBlock`. For both these cases, the flush error is not further bubbled up. Instead, the error is logged and a comment is provided why bubbling up an error would be less desirable in these cases. --- This pull request is part of a larger effort towards improving the shutdown / abort / fatal error handling in validation code. It is a first step towards implementing proper fatal error return type enforcement similar as proposed by theuni in this pull request [comment](https://github.com/bitcoin/bitcoin/pull/27711#issuecomment-1563561502). For ease of review of these critical changes, a first step would be checking that `AbortNode` leads to early and error-conveying returns at its call site. Further work for enforcing returns when `AbortNode` is called is done in https://github.com/bitcoin/bitcoin/pull/27862. ACKs for top commit: stickies-v: re-ACK d8041d4 ryanofsky: Code review ACK d8041d4e042957660827313951b18c8dd9a99a16 Tree-SHA512: 47ade9b873b15e567c8f60ca538d5a0daf32163e1031be3212a3a45eb492b866664b225f2787c9e40f3e0c089140157d8fd1039abc00c7bdfeec1b52ecd7e219
2023-09-15Return CAutoFile from BlockManager::Open*File()MarcoFalke
This is a refactor.
2023-09-12kernel: Move MessageStartChars to its own fileTheCharlatan
The protocol.h file contains many non-consensus related definitions and should thus not be part of the libbitcoinkernel. This commit makes protocol.h no longer a required include for users of the libbitcoinkernel. This commit is part of the libbitcoinkernel project, namely its stage 1 step 3: Decouple most non-consensus headers from libbitcoinkernel. Co-Authored-By: Cory Fields <cory-nospam-@coryfields.com>
2023-09-12[refactor] Define MessageStartChars as std::arrayTheCharlatan
2023-09-05Merge bitcoin/bitcoin#28195: blockstorage: Drop legacy -txindex checkfanquake
fae405556d56f6f13ce57f69a06b9ec1e825422b scripted-diff: Rename CBlockTreeDB -> BlockTreeDB (MarcoFalke) faf63039cce40f5cf8dea5a1d24945773c3433a1 Fixup style of moved code (MarcoFalke) fa65111b99627289fd47dcfaa5197e0f09b8a50e move-only: Move CBlockTreeDB to node/blockstorage (MarcoFalke) fa8685597e7302fc136f21b6dd3a4b187fa8e251 index: Drop legacy -txindex check (MarcoFalke) fa69148a0a26c5054dbccdceeac8e117bf449275 scripted-diff: Use blocks_path where possible (MarcoFalke) Pull request description: The only reason for the check was to print a warning about an increase in storage use. Now that 22.x is EOL and everyone should have migrated (or decided to not care about storage use), remove the check. Also, a move-only commit is included. (Rebased from https://github.com/bitcoin/bitcoin/pull/22242) ACKs for top commit: TheCharlatan: ACK fae405556d56f6f13ce57f69a06b9ec1e825422b, though I lack historical context to really judge the second commit fa8685597e7302fc136f21b6dd3a4b187fa8e251. stickies-v: ACK fae405556d56f6f13ce57f69a06b9ec1e825422b Tree-SHA512: 9da8f48767ae52d8e8e21c09a40c949cc0838794f1856cc5f58a91acd3f00a3bca818c8082242b3fdc9ca5badb09059570bb3870850d3807b75a8e23b5222da1
2023-08-31blockstorage: Return on fatal undo file flush errorTheCharlatan
By returning an error code if either `FlushUndoFile` or `FlushBlockFile` fail, the caller now has to explicitly handle block undo file flushing errors. Before this change such errors were non-explicitly ignored without a clear rationale. Besides the call to `FlushUndoFile` in `FlushBlockFile`, ignore its return code at its call site in `WriteUndoDataForBlock`. There, a failed flush of the undo data should not be indicative of a failed write. Add [[nodiscard]] annotations to `FlushUndoFile` such that its return value is not just ignored in the future.
2023-08-31blockstorage: Return on fatal block file flush errorTheCharlatan
By returning an error code if `FlushBlockFile` fails, the caller now has to explicitly handle block file flushing errors. Before this change such errors were non-explicitly ignored without a clear rationale. Prior to this patch `FlushBlockFile` may have failed silently in `Chainstate::FlushStateToDisk`. Improve this with a log line. Also add a TODO comment to flesh out whether returning early in the case of an error is appropriate or not. Returning early might be appropriate to prohibit `WriteBlockIndexDB` from writing a block index entry that does not refer to a fully flushed block. Besides `Chainstate::FlushStateToDisk`, `FlushBlockFile` is also called by `FindBlockPos`. Don't change the abort behavior there, since we don't want to fail the function if the flushing of already written blocks fails. Instead, just document it.
2023-08-31blockstorage: Mark FindBlockPos as nodiscardTheCharlatan
A false return value indicates a fatal error (disk space being too low), so make sure we always consume this error code. This commit is part of an ongoing process for making the handling of fatal errors more transparent and easier to understand.
2023-08-07Merge bitcoin/bitcoin#28191: refactor: Remove unused MessageStartChars ↵fanquake
parameters from BlockManager methods fa69e3a95c452c2ba3221b17c19fba5993b5d073 Remove unused MessageStartChars parameters from BlockManager methods (MarcoFalke) Pull request description: Seems odd to expose these for mocking, when it is not needed. Fix this by removing the the unused parameters and use the already existing member field instead. ACKs for top commit: Empact: utACK fa69e3a95c452c2ba3221b17c19fba5993b5d073 dergoegge: utACK fa69e3a95c452c2ba3221b17c19fba5993b5d073 Tree-SHA512: 7814e9560abba8d9c0926bcffc70f92e502d22f543af43671248f6fcd1433f35238553c0f05123fde6d8e0f80261af0ab0500927548115153bd68d57fe2da746
2023-08-02scripted-diff: Rename CBlockTreeDB -> BlockTreeDBMarcoFalke
-BEGIN VERIFY SCRIPT- sed -i 's|CBlockTreeDB|BlockTreeDB|g' $( git grep -l CBlockTreeDB ) -END VERIFY SCRIPT-
2023-08-01move-only: Move CBlockTreeDB to node/blockstorageMarcoFalke
The block index (CBlockTreeDB) is required to write and read blocks, so move it to blockstorage. This allows to drop the txdb.h include from `node/blockstorage.h`. Can be reviewed with: --color-moved=dimmed-zebra --color-moved-ws=ignore-all-space
2023-07-31Remove unused MessageStartChars parameters from BlockManager methodsMarcoFalke
2023-07-14Remove CChain dependency in node/blockstorageSuhas Daftuar
2023-07-14Explicitly track maximum block height stored in undo filesSuhas Daftuar
When writing a new block to disk, if we have filled up the current block file, then we flush and truncate that block file (to free allocated but unused space) before advancing to the next one. When this happens, we have to determine whether to also flush and truncate the corresponding undo file. Undo data is only written when blocks are connected, not when blocks are received. Thus it's possible that the corresponding undo file already has all the data it will ever have, and we should flush/truncate it as we advance files; or it's possible that there is more data we expect to write, and should therefore defer flush/truncation until undo data is later written. Prior to this commit, we made the determination of whether the undo file was full of all requisite data by comparing against the chain tip. This patch replaces that dependence on validation data structures by instead just tracking the highest height of any block written in the undo file as we go.
2023-07-11refactor: Move stopafterblockimport handling out of blockstorageTheCharlatan
This has the benefit of moving the StartShutdown call out of the blockstorage file and thus out of the kernel's responsibility. The user can now decide if he wants to start shutdown / interrupt after a block import or not.
2023-07-10index: verify blocks data existence only oncefurszy
At present, during init, we traverse the chain (once per index) to confirm that all necessary blocks to sync each index up to the current tip are present. To make the process more efficient, we can fetch the oldest block from the indexers and perform the chain data existence check from that point only once. This also moves the pruning violation check to the end of the 'loadinit' thread, which is where the reindex, block loading and chain activation processes happen. Making the node's startup process faster, allowing us to remove the global g_indexes_ready_to_sync flag, and enabling the execution of the pruning violation verification even when the reindex or reindex-chainstate flags are enabled (which has being skipped so far).
2023-07-10refactor: simplify pruning violation checkfurszy
By generalizing 'GetFirstStoredBlock' and implementing 'CheckBlockDataAvailability' we can dedup code and avoid repeating work when multiple indexes are enabled. E.g. get the oldest block across all indexes and perform the pruning violation check from that point up to the tip only once (this feature is being introduced in a follow-up commit). This commit shouldn't change behavior in any way. Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
2023-07-10make GetFirstStoredBlock assert that 'start_block' always has datafurszy
And transfer the responsibility of verifying whether 'start_block' has data or not to the caller. This is because the 'GetFirstStoredBlock' function responsibility is to return the first block containing data. And the current implementation can return 'start_block' when it has no data!. Which is misleading at least. Edge case behavior change: Previously, if the block tip lacked data but all preceding blocks contained data, there was no prune violation. And now, such scenario will result in a prune violation.
2023-07-07scripted-diff: rename 'loadblk' thread name to 'initload'furszy
The thread does not only load blocks, it loads the mempool and, in a future commit, will start the indexes as well. Also, renamed the 'ThreadImport' function to 'ImportBlocks' And the 'm_load_block' class member to 'm_thread_load'. -BEGIN VERIFY SCRIPT- sed -i "s/ThreadImport/ImportBlocks/g" $(git grep -l ThreadImport -- ':!/doc/') sed -i "s/loadblk/initload/g" $(git grep -l loadblk -- ':!/doc/release-notes/') sed -i "s/m_load_block/m_thread_load/g" $(git grep -l m_load_block) -END VERIFY SCRIPT-
2023-07-07init: start indexes sync earlierfurszy
The mempool load can take a while, and it is not needed for the indexes' synchronization. Also, having the mempool load function call inside 'blockstorage.cpp' wasn't structurally correct.
2023-06-28kernel: Pass interrupt reference to chainmanTheCharlatan
This and the following commit seek to decouple the libbitcoinkernel library from the shutdown code. As a library, it should it should have its own flexible interrupt infrastructure without relying on node-wide globals. The commit takes the first step towards this goal by de-globalising `ShutdownRequested` calls in kernel code. Co-authored-by: Russell Yanofsky <russ@yanofsky.org> Co-authored-by: TheCharlatan <seb.kung@gmail.com>
2023-05-17index: Enable reindex-chainstate with active indexesMartin Zumsande
This is achieved by letting the index sync thread wait until reindex-chainstate is finished. This also disables the pruning check when reindexing the chainstate (which is incompatible with prune mode) because there would be no chain at this point in init.
2023-05-10refactor, blockstorage: Replace stopafterblockimport argTheCharlatan
Add a stop_after_block_import field to the BlockManager options. Use this field instead of the global gArgs. This should allow users of the BlockManager to not rely on the global Args.
2023-05-10refactor: Move functions to BlockManager methodsTheCharlatan
This is a commit in preparation for the next few commits. The functions are moved to methods to avoid their re-declaration for the purpose of passing in BlockManager options. The functions that were now moved into the BlockManager should no longer use the params as an argument, but instead use the member variable. In the moved ReadBlockFromDisk and UndoReadFromDisk, change the function signature to accept a reference to a CBlockIndex instead of a raw pointer. The pointer is expected to be non-null, so reflect that in the type. To allow for the move of functions to BlockManager methods all call sites require an instantiated BlockManager, or a callback to one.
2023-05-04Remove unused chainparams from BlockManager methodsMarcoFalke
Also, replace pointer with reference while touching the signature.
2023-05-04Add BlockManagerOpts::chainparams referenceMarcoFalke
and use it in blockstorage.cpp
2023-03-23refactor: Move fs.* to util/fs.*TheCharlatan
The fs.* files are already part of the libbitcoin_util library. With the introduction of the fs_helpers.* it makes sense to move fs.* into the util/ directory as well.
2023-03-15refactor: Add and use PRUNE_TARGET_MANUAL constexprMarcoFalke
2023-03-15Move ::fImporting to BlockManagerMarcoFalke
2023-03-15Move ::fPruneMode into BlockManagerMarcoFalke