Disallow attestation production earlier than head (#2130)

## Issue Addressed

The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals.

Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS.

There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too.

## Proposed Changes

`BeaconChain::produce_unaggregated_attestation` has two paths:

1. Fast path: attesting to the head slot or later.
2. Slow path: attesting to a slot earlier than the head block.

Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage.

This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`).

This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md).

It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is:

- Not specified as "honest validator" attesting behaviour.
- Is behaviour that is risky for slashing (although, all validator clients *should* have slashing protection and will eventually fail if they do not).
- It disguises clock-sync issues between a BN and VC.

## Additional Info

It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.
This commit is contained in:
Paul Hauner 2021-01-20 06:52:37 +00:00
parent d9f940613f
commit b06559ae97
3 changed files with 35 additions and 51 deletions

View File

@ -853,37 +853,22 @@ impl<T: BeaconChainTypes> BeaconChain<T> {
Cow::Borrowed(&head.beacon_state), Cow::Borrowed(&head.beacon_state),
) )
} else { } else {
// Note: this method will fail if `slot` is more than `state.block_roots.len()` slots // We disallow producing attestations *prior* to the current head since such an
// prior to the head. // attestation would require loading a `BeaconState` from disk. Loading `BeaconState`
// from disk is very resource intensive and proposes a DoS risk from validator clients.
// //
// This seems reasonable, producing an attestation at a slot so far // Although we generally allow validator clients to do things that might harm us (i.e.,
// in the past seems useless, definitely in mainnet spec. In minimal spec, when the // we trust them), sometimes we need to protect the BN from accidental errors which
// block roots only contain two epochs of history, it's possible that you will fail to // could cause it significant harm.
// produce an attestation that would be valid to be included in a block. Given that
// minimal is only for testing, I think this is fine.
// //
// It is important to note that what's _not_ allowed here is attesting to a slot in the // This case is particularity harmful since the HTTP API can effectively call this
// past. You can still attest to a block an arbitrary distance in the past, just not as // function an unlimited amount of times. If `n` validators all happen to call it at
// if you are in a slot in the past. // the same time, we're going to load `n` states (and tree hash caches) into memory all
let beacon_block_root = *head.beacon_state.get_block_root(slot)?; // at once. With `n >= 10` we're looking at hundreds of MB or GBs of RAM.
let state_root = *head.beacon_state.get_state_root(slot)?; Err(Error::AttestingPriorToHead {
head_slot: head.beacon_block.slot(),
// Avoid holding a lock on the head whilst doing database reads. Good boi functions request_slot: slot,
// don't hog locks. })
drop(head);
let mut state = self
.get_state(&state_root, Some(slot))?
.ok_or(Error::MissingBeaconState(state_root))?;
state.build_committee_cache(RelativeEpoch::Current, &self.spec)?;
self.produce_unaggregated_attestation_for_block(
slot,
index,
beacon_block_root,
Cow::Owned(state),
)
} }
} }

View File

@ -92,6 +92,10 @@ pub enum BeaconChainError {
}, },
WeakSubjectivtyVerificationFailure, WeakSubjectivtyVerificationFailure,
WeakSubjectivtyShutdownError(TrySendError<&'static str>), WeakSubjectivtyShutdownError(TrySendError<&'static str>),
AttestingPriorToHead {
head_slot: Slot,
request_slot: Slot,
},
} }
easy_from_to!(SlotProcessingError, BeaconChainError); easy_from_to!(SlotProcessingError, BeaconChainError);

View File

@ -25,6 +25,7 @@ lazy_static! {
#[test] #[test]
fn produces_attestations() { fn produces_attestations() {
let num_blocks_produced = MainnetEthSpec::slots_per_epoch() * 4; let num_blocks_produced = MainnetEthSpec::slots_per_epoch() * 4;
let additional_slots_tested = MainnetEthSpec::slots_per_epoch() * 3;
let harness = BeaconChainHarness::new_with_store_config( let harness = BeaconChainHarness::new_with_store_config(
MainnetEthSpec, MainnetEthSpec,
@ -32,38 +33,32 @@ fn produces_attestations() {
StoreConfig::default(), StoreConfig::default(),
); );
// Skip past the genesis slot.
harness.advance_slot();
harness.extend_chain(
num_blocks_produced as usize,
BlockStrategy::OnCanonicalHead,
AttestationStrategy::AllValidators,
);
let chain = &harness.chain; let chain = &harness.chain;
let state = &harness.chain.head().expect("should get head").beacon_state;
assert_eq!(state.slot, num_blocks_produced, "head should have updated");
assert_ne!(
state.finalized_checkpoint.epoch, 0,
"head should have updated"
);
let current_slot = chain.slot().expect("should get slot");
// Test all valid committee indices for all slots in the chain. // Test all valid committee indices for all slots in the chain.
for slot in 0..=current_slot.as_u64() + MainnetEthSpec::slots_per_epoch() * 3 { // for slot in 0..=current_slot.as_u64() + MainnetEthSpec::slots_per_epoch() * 3 {
for slot in 0..=num_blocks_produced + additional_slots_tested {
if slot > 0 && slot <= num_blocks_produced {
harness.advance_slot();
harness.extend_chain(
1,
BlockStrategy::OnCanonicalHead,
AttestationStrategy::AllValidators,
);
}
let slot = Slot::from(slot); let slot = Slot::from(slot);
let mut state = chain let mut state = chain
.state_at_slot(slot, StateSkipConfig::WithStateRoots) .state_at_slot(slot, StateSkipConfig::WithStateRoots)
.expect("should get state"); .expect("should get state");
let block_slot = if slot > current_slot { let block_slot = if slot <= num_blocks_produced {
current_slot
} else {
slot slot
} else {
Slot::from(num_blocks_produced)
}; };
let block = chain let block = chain
.block_at_slot(block_slot) .block_at_slot(block_slot)
.expect("should get block") .expect("should get block")