Don't return errors on HTTP API for already-known messages (#3341)

## Issue Addressed - Resolves #3266 ## Proposed Changes Return 200 OK rather than an error when a block, attestation or sync message is already known. Presently, we will log return an error which causes a BN to go "offline" from the VCs perspective which causes the fallback mechanism to do work to try and avoid and upcheck offline nodes. This can be observed as instability in the `vc_beacon_nodes_available_count` metric. The current behaviour also causes scary logs for the user. There's nothing to *actually* be concerned about when we see duplicate messages, this can happen on fallback systems (see code comments). ## Additional Info NA
2024-12-24 20:47:17 +00:00 · 2022-08-10 07:52:57 +00:00 · 2022-08-10 07:52:57 +00:00 · 2de26b20f8
commit 2de26b20f8
parent 052d5cf31f
3 changed files with 105 additions and 3 deletions
--- a/beacon_node/http_api/src/lib.rs
+++ b/beacon_node/http_api/src/lib.rs
@ -1168,12 +1168,46 @@ pub fn serve<T: BeaconChainTypes>(
                blocking_json_task(move || {
                    let seen_timestamp = timestamp_now();
                    let mut failures = Vec::new();
                    let mut num_already_known = 0;
                    for (index, attestation) in attestations.as_slice().iter().enumerate() {
                        let attestation = match chain
                            .verify_unaggregated_attestation_for_gossip(attestation, None)
                        {
                            Ok(attestation) => attestation,
                            Err(AttnError::PriorAttestationKnown { .. }) => {
                                num_already_known += 1;
                                // Skip to the next attestation since an attestation for this
                                // validator is already known in this epoch.
                                //
                                // There's little value for the network in validating a second
                                // attestation for another validator since it is either:
                                //
                                // 1. A duplicate.
                                // 2. Slashable.
                                // 3. Invalid.
                                //
                                // We are likely to get duplicates in the case where a VC is using
                                // fallback BNs. If the first BN actually publishes some/all of a
                                // batch of attestations but fails to respond in a timely fashion,
                                // the VC is likely to try publishing the attestations on another
                                // BN. That second BN may have already seen the attestations from
                                // the first BN and therefore indicate that the attestations are
                                // "already seen". An attestation that has already been seen has
                                // been published on the network so there's no actual error from
                                // the perspective of the user.
                                //
                                // It's better to prevent slashable attestations from ever
                                // appearing on the network than trying to slash validators,
                                // especially those validators connected to the local API.
                                //
                                // There might be *some* value in determining that this attestation
                                // is invalid, but since a valid attestation already it exists it
                                // appears that this validator is capable of producing valid
                                // attestations and there's no immediate cause for concern.
                                continue;
                            }
                            Err(e) => {
                                error!(log,
                                    "Failure verifying attestation for gossip";
@ -1240,6 +1274,15 @@ pub fn serve<T: BeaconChainTypes>(
                            ));
                        }
                    }
                    if num_already_known > 0 {
                        debug!(
                            log,
                            "Some unagg attestations already known";
                            "count" => num_already_known
                        );
                    }
                    if failures.is_empty() {
                        Ok(())
                    } else {
@ -2234,6 +2277,16 @@ pub fn serve<T: BeaconChainTypes>(
                            // identical aggregates, especially if they're using the same beacon
                            // node.
                            Err(AttnError::AttestationAlreadyKnown(_)) => continue,
                            // If we've already seen this aggregator produce an aggregate, just
                            // skip this one.
                            //
                            // We're likely to see this with VCs that use fallback BNs. The first
                            // BN might time-out *after* publishing the aggregate and then the
                            // second BN will indicate it's already seen the aggregate.
                            //
                            // There's no actual error for the user or the network since the
                            // aggregate has been successfully published by some other node.
                            Err(AttnError::AggregatorAlreadyKnown(_)) => continue,
                            Err(e) => {
                                error!(log,
                                    "Failure verifying aggregate and proofs";
--- a/beacon_node/http_api/src/publish_blocks.rs
+++ b/beacon_node/http_api/src/publish_blocks.rs
@ -1,9 +1,9 @@
 use crate::metrics;
 use beacon_chain::validator_monitor::{get_block_delay_ms, timestamp_now};
-use beacon_chain::{BeaconChain, BeaconChainTypes, CountUnrealized};
+use beacon_chain::{BeaconChain, BeaconChainTypes, BlockError, CountUnrealized};
 use lighthouse_network::PubsubMessage;
 use network::NetworkMessage;
-use slog::{crit, error, info, Logger};
+use slog::{crit, error, info, warn, Logger};
 use slot_clock::SlotClock;
 use std::sync::Arc;
 use tokio::sync::mpsc::UnboundedSender;
@ -86,6 +86,27 @@ pub async fn publish_block<T: BeaconChainTypes>(
            Ok(())
        }
        Err(BlockError::BlockIsAlreadyKnown) => {
            info!(
                log,
                "Block from HTTP API already known";
                "block" => ?block.canonical_root(),
                "slot" => block.slot(),
            );
            Ok(())
        }
        Err(BlockError::RepeatProposal { proposer, slot }) => {
            warn!(
                log,
                "Block ignored due to repeat proposal";
                "msg" => "this can happen when a VC uses fallback BNs. \
                    whilst this is not necessarily an error, it can indicate issues with a BN \
                    or between the VC and BN.",
                "slot" => slot,
                "proposer" => proposer,
            );
            Ok(())
        }
        Err(e) => {
            let msg = format!("{:?}", e);
            error!(
--- a/beacon_node/http_api/src/sync_committees.rs
+++ b/beacon_node/http_api/src/sync_committees.rs
@ -11,7 +11,7 @@ use beacon_chain::{
 use eth2::types::{self as api_types};
 use lighthouse_network::PubsubMessage;
 use network::NetworkMessage;
-use slog::{error, warn, Logger};
+use slog::{debug, error, warn, Logger};
 use slot_clock::SlotClock;
 use std::cmp::max;
 use std::collections::HashMap;
@ -189,6 +189,24 @@ pub fn process_sync_committee_signatures<T: BeaconChainTypes>(
                    verified_for_pool = Some(verified);
                }
                // If this validator has already published a sync message, just ignore this message
                // without returning an error.
                //
                // This is likely to happen when a VC uses fallback BNs. If the first BN publishes
                // the message and then fails to respond in a timely fashion then the VC will move
                // to the second BN. The BN will then report that this message has already been
                // seen, which is not actually an error as far as the network or user are concerned.
                Err(SyncVerificationError::PriorSyncCommitteeMessageKnown {
                    validator_index,
                    slot,
                }) => {
                    debug!(
                        log,
                        "Ignoring already-known sync message";
                        "slot" => slot,
                        "validator_index" => validator_index,
                    );
                }
                Err(e) => {
                    error!(
                        log,
@ -283,6 +301,16 @@ pub fn process_signed_contribution_and_proofs<T: BeaconChainTypes>(
            // If we already know the contribution, don't broadcast it or attempt to
            // further verify it. Return success.
            Err(SyncVerificationError::SyncContributionAlreadyKnown(_)) => continue,
            // If we've already seen this aggregator produce an aggregate, just
            // skip this one.
            //
            // We're likely to see this with VCs that use fallback BNs. The first
            // BN might time-out *after* publishing the aggregate and then the
            // second BN will indicate it's already seen the aggregate.
            //
            // There's no actual error for the user or the network since the
            // aggregate has been successfully published by some other node.
            Err(SyncVerificationError::AggregatorAlreadyKnown(_)) => continue,
            Err(e) => {
                error!(
                    log,