bitcoin core development – What are the use cases where very old rev*.dat files are needed?
- and they’re 1::1 with block files (i.e., for a given NNNNN the files
revNNNNN.dathold information for the same
- and they’re written and chunked in the order in which blocks are received by the node (from the network)
This assumption is incorrect. The
rev*.dat files are actually written in height order. This is because they are written at a different time. The undo data for a block is only written after it has been connected to the chain tip, so the undo data ends up being in the order that blocks are connected. So for blocks which are downloaded but never connected, undo data does not exist for them (although this case is unlikely as blocks aren’t downloaded unless they are supposed to be connected).
Then: Are very old rev*.dat files ever used? Say, those that belong to blocks buried more than 100 blocks deep from the top of the chain? If they are ever used what is the use case?
When it comes to the operation of the node, no, old undo data is not actually being used. However it does have some uses outside of node operation, particularly in examining the blockchain via RPC, and the creation of the block filter and the coin stats indices.
The RPC uses undo data in some places because it contains the UTXOs that were spend by that block. This allows
getblock to calculate the transaction fees paid for each transaction in the block. However this is not a hard requirement, and if the undo data is not found, then it simply doesn’t calculate the fees.
Another RPC that uses the undo data is
getblockstats which uses the undo data to also calculate the fees for the block as well as the change in size of the UTXO set that the block causes. If the undo does not exist, then this RPC would fail.
For both the block filter and coin stats indices, the undo data is used to build them because it provides a snapshot of the UTXO set changes made. This allows the index to be populated without having to track the UTXO set while it is being built, so it reduces the memory usage and increases performance.
For node operation, undo data is only needed if the block were to be disconnected from the tip. This only occurs when reorgs happen. However Bitcoin is built with the possibility that there could be an extremely large work reorg that could reorg out things that are considered ancient history.
However, old undo data is indeed not used in operation of the node, and there could be a mode added which deletes them. On my node, this totals to 48 GB of data, so the space savings are not nearly as significant as normal pruning.
(and, w.r.t. the answer here of the first question in the following list, what is the database corruption that occurs – presumably to the UTXO state – if the rev files are all deleted and why is it necessary to regenerate them?)
They need to be regenerated because of how the block index works. Since it also contains the location of the undo data, if that data were to be missing, the block index will find that to be an error and thus require a reindex. If an option were added to allow the deletion of rev*.dat files, then that would obviously have to be changed.
(bonus question: given a txindex: is there anything in the rev*.dat file that can’t be regenerated simply by looking at the blocks in the corresponding blk*.dat file and using the txindex to find previous transactions?
Yes, but that would be pretty slow.