Why is the data not restored when restoring the Elasticsearch database?

A

Alexander Yelagin2015-06-09 07:25:17

elasticsearch

Alexander Yelagin, 2015-06-09 07:25:17

Hello.
There is a base on Elasticsearch 1.5, I backup it using standard means

curl -XPUT 'http://localhost:9200/_snapshot/test' -d '{"type": "fs","settings": {"location": "/path","compress": true}}'

The backup is perfectly preserved. Then I uploaded the data to the server, elasticsearch 1.5.2 version is installed there. Clean install no data. Doing a recovery

curl -XPOST "http://my_ip:9200/_snapshot/test/snapshot_1/_restore?wait_for_completion=true"

it can be seen that the elastic is working, trying to restore, the processes are hanging. backup size 200gb A day after the start of the recovery - the backup was not restored. The database is not replenished, but only logs are written, huge logs.

[2015-06-09 07:22:02,920][WARN ][indices.cluster          ] [Batwing]  marking and sending shard failed due to [failed recovery]
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [bulks][5] failed recovery
  at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:162)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [bulks][5] restore failed
  at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:135)
  at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:109)
  ... 3 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [bulks][5] failed to restore snapshot [snapshot_1]
  at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:164)
  at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:126)
  ... 4 more
Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [bulks][5] Can't restore corrupted shard
  at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:716)
  at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
  ... 5 more
Caused by: org.apache.lucene.index.CorruptIndexException: [bulks][5] Preexisting corrupted index [corrupted_dIp4fqTkQ7euY5ESwklBMg] caused by: CorruptIndexException[verification failed (hardware problem?) : expected=14dlt9i actual=null writtenLength=17241872 expectedLength=17242199 (resource=name [_1cpg.cfs], length [17242199], checksum [14dlt9i], writtenBy [4.10.4])]
org.apache.lucene.index.CorruptIndexException: verification failed (hardware problem?) : expected=14dlt9i actual=null writtenLength=17241872 expectedLength=17242199 (resource=name [_1cpg.cfs], length [17242199], checksum [14dlt9i], writtenBy [4.10.4])
  at org.elasticsearch.index.store.Store$LuceneVerifyingIndexOutput.verify(Store.java:1227)
  at org.elasticsearch.index.store.Store.verify(Store.java:460)
  at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:813)
  at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:770)
  at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
  at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:126)
  at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:109)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)

  at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:547)
  at org.elasticsearch.index.store.Store.failIfCorrupted(Store.java:528)
  at org.elasticsearch.index.store.Store.getMetadata(Store.java:219)
  at org.elasticsearch.index.store.Store.getMetadataOrEmpty(Store.java:185)
  at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:713)
  ... 6 more
[2015-06-09 07:22:02,920][WARN ][cluster.action.shard     ] [Batwing] [bulks][5] received shard failed for [bulks][5], node[gjYJdwYVRWeA6QCUec-hZw], [P], restoring[test:snapshot_1], s[INITIALIZING], indexUUID [2UxyUL5BT4CunAdHYuceNQ], reason [shard failure [failed recovery][IndexShardGatewayRecoveryException[[bulks][5] failed recovery]; nested: IndexShardRestoreFailedException[[bulks][5] restore failed]; nested: IndexShardRestoreFailedException[[bulks][5] failed to restore snapshot [snapshot_1]]; nested: IndexShardRestoreFailedException[[bulks][5] Can't restore corrupted shard]; nested: CorruptIndexException[[bulks][5] Preexisting corrupted index [corrupted_dIp4fqTkQ7euY5ESwklBMg] caused by: CorruptIndexException[verification failed (hardware problem?) : expected=14dlt9i actual=null writtenLength=17241872 expectedLength=17242199 (resource=name [_1cpg.cfs], length [17242199], checksum [14dlt9i], writtenBy [4.10.4])]
org.apache.lucene.index.CorruptIndexException: verification failed (hardware problem?) : expected=14dlt9i actual=null writtenLength=17241872 expectedLength=17242199 (resource=name [_1cpg.cfs], length [17242199], checksum [14dlt9i], writtenBy [4.10.4])
  at org.elasticsearch.index.store.Store$LuceneVerifyingIndexOutput.verify(Store.java:1227)
  at org.elasticsearch.index.store.Store.verify(Store.java:460)
  at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:813)
  at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:770)
  at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162)
  at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:126)
  at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:109)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
]; ]]

Reply

Answer the question

In order to leave comments, you need to log in

[[+comments_count]] answer(s)

A

Alexey Yamschikov, 2015-06-11
@mobilesfinks

The backup is perfectly preserved.

How did you determine that everything is fine?
Caused by: org.apache.lucene.index.CorruptIndexException: [bulks][5] Preexisting corrupted index
...
Can't restore corrupted shard
broken index. Try this
and more

check the integrity of the file system. Check the operation. Just to make sure it's not a hardware issue.
2 more options:
- Try to transfer the database by simple copying via rsync.
- unite both servers into a cluster and let it transfer all shards to the second one. You just watch this process through kopf, for example.