What is the most optimal/reliable option for data storage?

M

Mouvdy2016-08-31 01:30:00

linux

Mouvdy, 2016-08-31 01:30:00

Greetings,
Looking for an optimal / secure / fast way to store data.
I work with a bash script that, after processing, must record the "website domain" somewhere and then check whether it has already been processed.
At the moment, about 1.5 million domains have already accumulated (domain type: yandex.ru, google.com, toster.ru, etc.).
I did not think that there would be such a volume of data, so I did not immediately think about scalability.
Considering that this is only "for me", I implemented it stupidly and in haste - I create folders with domain names in the sites / folder and then just search to see if the directory exists :) Everything works pretty fast.
Code example:

if [ -d "$systemdir/$downloadfolder" ]
        then
        echo "nothing to do" # папка существует
        else
        mkdir -p $systemdir/$downloadfolder
    fi

But I ran into a server move and the need to quickly deploy my working system - now there are too many directories and when transferring data to another server everything is complicated, even if I just do ls sites > list.sh and then append the creation of a directory to each line and execute it as a script .
And so I thought about changing the principle of work.
1. You can store everything in a text file and record / search on it - a file of about 130 mb - it seems to work quickly.
2. You can store in mysql and make the queries I need to the database from the bash script
But in both cases, a problem arises: at peaks, I will have about 800-1300 requests per second to write / search for a file. I'm afraid that information in a text file may not be written correctly, and mysql will simply fail (LA > 700) at high server loads and a large number of requests.
What is the best way to do everything? What are the other possible options?
Or maybe stay on my "file structure" of data storage, since for me this is the safest / most productive way (of course, I mean for my current tasks) and not pay attention to the time of system deployment from backups?

Reply

Answer the question

In order to leave comments, you need to log in

5 answer(s)

W

Walt Disney, 2016-08-31
@ruFelix

Take redis and tweak disk sync depending on your server's paranoia and performance.
The fact is that the table hash is the ideal data structure for your task.

S

Sergey, 2016-08-31
@edinorog

And so I thought about changing the principle of work.
1. You can store everything in a text file and record / search on it - a file of about 130 mb - it seems to work quickly.

Probably it is necessary to esteem a principle of search in the text file. you go nuts and more such thoughts will not visit a bright head.

R

Roman Mirilaczvili, 2016-09-03
@2ord

Directories nafig are not needed, because only an extra load on the file system when searching for an existing one.
Instead, use Redis with its SADD, SISMEMBER.

A

Alexander Masterov, 2016-08-31
@AlexMasterov

I'm more looking for the best and easiest way to work/deploy from backups.

Try Tarantool .
- "persistence": transaction log (.xlog) and full database snapshots (.snap);
- simple transfer from server to server (just copy all the files with the scp command);
- a simple system of hot backup and connection of a replica;
- faster than Redis :)

A

Artemy, 2016-08-31
@MetaAbstract

Berkeley_DB take it and it is fast and reliable, it easily pulls volumes and the functionality is powerful.