go-pulse/p2p/simulations/mocker_test.go
Ferenc Szabo 50b872bf05 p2p, swarm: fix node up races by granular locking (#18976)
* swarm/network: DRY out repeated giga comment

I not necessarily agree with the way we wait for event propagation.
But I truly disagree with having duplicated giga comments.

* p2p/simulations: encapsulate Node.Up field so we avoid data races

The Node.Up field was accessed concurrently without "proper" locking.
There was a lock on Network and that was used sometimes to access
the  field. Other times the locking was missed and we had
a data race.

For example: https://github.com/ethereum/go-ethereum/pull/18464
The case above was solved, but there were still intermittent/hard to
reproduce races. So let's solve the issue permanently.

resolves: ethersphere/go-ethereum#1146

* p2p/simulations: fix unmarshal of simulations.Node

Making Node.Up field private in 13292ee897e345045fbfab3bda23a77589a271c1
broke TestHTTPNetwork and TestHTTPSnapshot. Because the default
UnmarshalJSON does not handle unexported fields.

Important: The fix is partial and not proper to my taste. But I cut
scope as I think the fix may require a change to the current
serialization format. New ticket:
https://github.com/ethersphere/go-ethereum/issues/1177

* p2p/simulations: Add a sanity test case for Node.Config UnmarshalJSON

* p2p/simulations: revert back to defer Unlock() pattern for Network

It's a good patten to call `defer Unlock()` right after `Lock()` so
(new) error cases won't miss to unlock. Let's get back to that pattern.

The patten was abandoned in 85a79b3ad3,
while fixing a data race. That data race does not exist anymore,
since the Node.Up field got hidden behind its own lock.

* p2p/simulations: consistent naming for test providers Node.UnmarshalJSON

* p2p/simulations: remove JSON annotation from private fields of Node

As unexported fields are not serialized.

* p2p/simulations: fix deadlock in Network.GetRandomDownNode()

Problem: GetRandomDownNode() locks -> getDownNodeIDs() ->
GetNodes() tries to lock -> deadlock

On Network type, unexported functions must assume that `net.lock`
is already acquired and should not call exported functions which
might try to lock again.

* p2p/simulations: ensure method conformity for Network

Connect* methods were moved to p2p/simulations.Network from
swarm/network/simulation. However these new methods did not follow
the pattern of Network methods, i.e., all exported method locks
the whole Network either for read or write.

* p2p/simulations: fix deadlock during network shutdown

`TestDiscoveryPersistenceSimulationSimAdapter` often got into deadlock.
The execution was stuck on two locks, i.e, `Kademlia.lock` and
`p2p/simulations.Network.lock`. Usually the test got stuck once in each
20 executions with high confidence.

`Kademlia` was stuck in `Kademlia.EachAddr()` and `Network` in
`Network.Stop()`.

Solution: in `Network.Stop()` `net.lock` must be released before
calling `node.Stop()` as stopping a node (somehow - I did not find
the exact code path) causes `Network.InitConn()` to be called from
`Kademlia.SuggestPeer()` and that blocks on `net.lock`.

Related ticket: https://github.com/ethersphere/go-ethereum/issues/1223

* swarm/state: simplify if statement in DBStore.Put()

* p2p/simulations: remove faulty godoc from private function

The comment started with the wrong method name.

The method is simple and self explanatory. Also, it's private.
=> Let's just remove the comment.
2019-02-18 07:38:14 +01:00

173 lines
4.7 KiB
Go

// Copyright 2017 The go-ethereum Authors
// This file is part of the go-ethereum library.
//
// The go-ethereum library is free software: you can redistribute it and/or modify
// it under the terms of the GNU Lesser General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// The go-ethereum library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public License
// along with the go-ethereum library. If not, see <http://www.gnu.org/licenses/>.
// Package simulations simulates p2p networks.
// A mocker simulates starting and stopping real nodes in a network.
package simulations
import (
"encoding/json"
"net/http"
"net/url"
"strconv"
"sync"
"testing"
"time"
"github.com/ethereum/go-ethereum/p2p/enode"
)
func TestMocker(t *testing.T) {
//start the simulation HTTP server
_, s := testHTTPServer(t)
defer s.Close()
//create a client
client := NewClient(s.URL)
//start the network
err := client.StartNetwork()
if err != nil {
t.Fatalf("Could not start test network: %s", err)
}
//stop the network to terminate
defer func() {
err = client.StopNetwork()
if err != nil {
t.Fatalf("Could not stop test network: %s", err)
}
}()
//get the list of available mocker types
resp, err := http.Get(s.URL + "/mocker")
if err != nil {
t.Fatalf("Could not get mocker list: %s", err)
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
t.Fatalf("Invalid Status Code received, expected 200, got %d", resp.StatusCode)
}
//check the list is at least 1 in size
var mockerlist []string
err = json.NewDecoder(resp.Body).Decode(&mockerlist)
if err != nil {
t.Fatalf("Error decoding JSON mockerlist: %s", err)
}
if len(mockerlist) < 1 {
t.Fatalf("No mockers available")
}
nodeCount := 10
var wg sync.WaitGroup
events := make(chan *Event, 10)
var opts SubscribeOpts
sub, err := client.SubscribeNetwork(events, opts)
defer sub.Unsubscribe()
//wait until all nodes are started and connected
//store every node up event in a map (value is irrelevant, mimic Set datatype)
nodemap := make(map[enode.ID]bool)
wg.Add(1)
nodesComplete := false
connCount := 0
go func() {
for {
select {
case event := <-events:
if isNodeUp(event) {
//add the correspondent node ID to the map
nodemap[event.Node.Config.ID] = true
//this means all nodes got a nodeUp event, so we can continue the test
if len(nodemap) == nodeCount {
nodesComplete = true
}
} else if event.Conn != nil && nodesComplete {
connCount += 1
if connCount == (nodeCount-1)*2 {
wg.Done()
return
}
}
case <-time.After(30 * time.Second):
wg.Done()
t.Fatalf("Timeout waiting for nodes being started up!")
}
}
}()
//take the last element of the mockerlist as the default mocker-type to ensure one is enabled
mockertype := mockerlist[len(mockerlist)-1]
//still, use hardcoded "probabilistic" one if available ;)
for _, m := range mockerlist {
if m == "probabilistic" {
mockertype = m
break
}
}
//start the mocker with nodeCount number of nodes
resp, err = http.PostForm(s.URL+"/mocker/start", url.Values{"mocker-type": {mockertype}, "node-count": {strconv.Itoa(nodeCount)}})
if err != nil {
t.Fatalf("Could not start mocker: %s", err)
}
if resp.StatusCode != 200 {
t.Fatalf("Invalid Status Code received for starting mocker, expected 200, got %d", resp.StatusCode)
}
wg.Wait()
//check there are nodeCount number of nodes in the network
nodesInfo, err := client.GetNodes()
if err != nil {
t.Fatalf("Could not get nodes list: %s", err)
}
if len(nodesInfo) != nodeCount {
t.Fatalf("Expected %d number of nodes, got: %d", nodeCount, len(nodesInfo))
}
//stop the mocker
resp, err = http.Post(s.URL+"/mocker/stop", "", nil)
if err != nil {
t.Fatalf("Could not stop mocker: %s", err)
}
if resp.StatusCode != 200 {
t.Fatalf("Invalid Status Code received for stopping mocker, expected 200, got %d", resp.StatusCode)
}
//reset the network
_, err = http.Post(s.URL+"/reset", "", nil)
if err != nil {
t.Fatalf("Could not reset network: %s", err)
}
//now the number of nodes in the network should be zero
nodesInfo, err = client.GetNodes()
if err != nil {
t.Fatalf("Could not get nodes list: %s", err)
}
if len(nodesInfo) != 0 {
t.Fatalf("Expected empty list of nodes, got: %d", len(nodesInfo))
}
}
func isNodeUp(event *Event) bool {
return event.Node != nil && event.Node.Up()
}