V
V
Vadim Rybalko2014-02-06 11:26:08
MongoDB
Vadim Rybalko, 2014-02-06 11:26:08

Mongodb replicaset - eternal STARTUP2?

There is a monga, a normally functioning replica set of three participants: primary, secondary, arbiter:

MongoDB shell version: 2.4.8
connecting to: test
set_v2:STARTUP2> rs.status( )
{
  "set" : "set_v2",
  "date" : ISODate("2014-02-06T08:20:54Z"),
  "myState" : 5,
  "syncingTo" : "deb-db:27017",
  "members" : [
    {
      "_id" : 1,
      "name" : "eng-db:27017",
      "health" : 1,
      "state" : 7,
      "stateStr" : "ARBITER",
      "uptime" : 169086,
      "lastHeartbeat" : ISODate("2014-02-06T08:20:53Z"),
      "lastHeartbeatRecv" : ISODate("2014-02-06T08:20:54Z"),
      "pingMs" : 50
    },
    {
      "_id" : 2,
      "name" : "jam-db:27017",
      "health" : 1,
      "state" : 2,
      "stateStr" : "SECONDARY",
      "uptime" : 169086,
      "optime" : Timestamp(1391674853, 18),
      "optimeDate" : ISODate("2014-02-06T08:20:53Z"),
      "lastHeartbeat" : ISODate("2014-02-06T08:20:53Z"),
      "lastHeartbeatRecv" : ISODate("2014-02-06T08:20:53Z"),
      "pingMs" : 50,
      "syncingTo" : "deb-db:27017"
    },
    {
      "_id" : 3,
      "name" : "deb-db:27017",
      "health" : 1,
      "state" : 1,
      "stateStr" : "PRIMARY",
      "uptime" : 169086,
      "optime" : Timestamp(1391674852, 50),
      "optimeDate" : ISODate("2014-02-06T08:20:52Z"),
      "lastHeartbeat" : ISODate("2014-02-06T08:20:52Z"),
      "lastHeartbeatRecv" : ISODate("2014-02-06T08:20:52Z"),
      "pingMs" : 50
    },
    {
      "_id" : 4,
      "name" : "bac-db:27017",
      "health" : 1,
      "state" : 5,
      "stateStr" : "STARTUP2",
      "uptime" : 169109,
      "optime" : Timestamp(1391505782, 63),
      "optimeDate" : ISODate("2014-02-04T09:23:02Z"),
      "errmsg" : "syncing to: deb-db:27017",
      "self" : true
    }
  ],
  "ok" : 1
}

The bases are big enough. There was a need to add another participant. I added it in the standard way, but after a while the replica stopped taking the dump from primary.
Fragment of the log on a new member with STARTUP2 status:
Thu Feb  6 11:42:25.931 [rsBackgroundSync] Socket recv() timeout  212.158.000.000:27017
Thu Feb  6 11:42:25.931 [rsBackgroundSync] SocketException: remote: 212.158.000.000:27017 error: 9001 socket exception [RECV_TIMEOUT] server [212.158.000.000:27017] 
Thu Feb  6 11:42:25.931 [rsBackgroundSync] DBClientCursor::init call() failed
Thu Feb  6 11:42:25.931 [rsBackgroundSync] replSet not trying to sync from secondary_host:27017, it is vetoed for 389 more seconds

Log fragment on Primary:
Thu Feb  6 12:16:19.894 [conn6710131] query local.oplog.rs query: { ts: { $gte: Timestamp 1391505782000|63 } } cursorid:7488360248332995795 ntoreturn:0 ntoskip:0 nscanned:102 keyUpdates:0 numYields: 19063 locks(micros) r:7039453 nreturned:101 reslen:16421 39051ms

That is, it looks like the oplog is very large and there is a certain timeout for the replica member and startup2 does not fit into this timeout.
How, in fact, to win and start a new secondary in this case?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
E
egor_nullptr, 2014-02-06
@egor_nullptr

* Remove the problem node from the replica
* Remove everything that it managed to pull from PRIMARY
* Remove the arbiter from the replica
* Add the first node with general rights
something like this

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question