S
S
Stan_12015-04-05 21:43:51
elasticsearch
Stan_1, 2015-04-05 21:43:51

What hardware to choose for ElasticSearch?

Good afternoon,
Please advise. Now for one project I plan to buy a dedicated server for ElasticSearch. The total size of the indexes is 25 GB, about 42 million records, the number of which is growing very slowly. That is almost a fixed database. The project is divided into two parts.
1. Assembly and reassembly of data. Occurs once a week, shovels the entire database in order to eventually form data sets given to users.
2. Actually, the main priority is to quickly give to users.
Question 1 - how best to organize a system with such characteristics? Three servers in mind:
Server 1 - 8 core CPU, 96 Gb RAM, 2x2Tb HDD
Server 2 - 8 core CPU, 96 Gb RAM, 2x600Gb SAS
Server 3 - 8 core CPU, 32 Gb RAM, 2x300Gb SSD
As far as I understand, ES itself works optimally only when it is given up to 32 GB of memory. Based on this, the following combinations of settings are drawn in my head in order to get almost complete in-memory work.
1. 16 Gb under ES HEAP, 16 Gb - system, 64 Gb - temporary disk in RAM for /tmp data (server 1 or 2)
2. 64 Gb under ES HEAP (server 1 or 2)
3. 16 Gb under ES HEAP , and rely on the speed of SSD (server 3)
Or are there any other options? And how necessary is SAS for such a load?
Question 2 - once a week, the data is rebuilt using different scripts, which takes 2.5 hours and affects up to 30% of the volume of records. Everything is built on bulk, mget, scroll, that is, on batch operations. Again, in the order of delirium in my head, there are two thoughts on how to do it optimally:
1. On the test server, rebuild the data and then simply transfer all ready-made records from the test server to production in a stream, without pulling a large number of directories that make up the target record. Pros: a) a complete copy of the combat data on the test server, b) no avalanche unpredictable load on the production server, c) not knocking data out of the cache for the period of data
reassembly full synchronization. Pros: a) the software infrastructure is greatly simplified.
All the arguments turned out to be somehow miserable, but this is from ignorance. You just need the best paractics. :)
Thanks in advance!

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question