T
T
tibitibidoh2015-10-27 16:41:45
big data
tibitibidoh, 2015-10-27 16:41:45

Which HDD is best for a research project?

Friends who are well versed in hardware, please tell me which hdd is desirable to take for storing a large amount of data?
To understand the task, I’ll explain that I want to scrawl a large number of (M +) sites as a scientific experiment for the purpose of subsequent analysis, for this you need to stock up on hard drives, for the first time I think 100TB should be enough, then you can buy more ...
Accordingly, I would like to hear advice professionals answers to such questions:
1) What brands/models are now the most reliable (WD/Toshiba/Seagate)?
2) From the point of view of financial savings, what is the optimal amount to take today?
3) Are there any features of storing a large number of small files (html files, images for them)?
Thanks everyone for the replies!

Answer the question

In order to leave comments, you need to log in

4 answer(s)
P
Puma Thailand, 2015-10-27
@opium

it is more profitable to take only 4 TB disks, the cheapest
about reliability read the backblaze blog
https://www.backblaze.com/blog/

P
Pavel, 2015-10-29
@pbt39

I just want to write ... I'm selling a data center, cheap .....
let's consider www.raid-calculator.com
we take 6 times 8 disks with a capacity of 3TB, we collect a large ZFS pool from them (in order for zfs to behave well, it needs leave free space, take 7 times 8 discs each
) the write speed will not be very high, I think the analog of raid6 is enough, it will allow you not to lose data and sleep peacefully during the rebuild and the overhead is not as high as for mirrors ....
the result should be something like this
gal.redsquirrel.me/images/house_projects /server_ro...
and don't put millions of files in one directory....

P
postgree, 2015-11-02
@postgree

1) What brands/models are currently the most reliable (WD/Toshiba/Seagate)?

Toshiba is cheaper and Hitachi (Ultrastar) is more expensive.
2) From the point of view of financial savings, what is the optimal amount to take today?
If it's just the most efficient in terms of volume / price, then 4Tb
3) Are there any features of storing a large number of small files (html files, images for them)?
How small? How will you pool your disk space? In fact, the size of the block obtained after merging plays here. The larger the block, the greater the performance on large files, and the more disk space overhead on small files.
I considered an overhead on a test sample of files from the metadata table:
and on my data, reducing the block size less than 16384 did not make sense, because a difference of a couple of percent is not fundamental.
Will you keep files in fs, db or your bikes?
The files were sorted into directories according to the algorithm /file_dir/{md5h::substr(0,2)}/{md5h::substr(2,2)}/sha256h
The hashes had to be calculated according to the task, so I didn’t really bother about the environment.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question