Answer the question
In order to leave comments, you need to log in
How to organize file synchronization for your own CDN?
It was required to distribute the average volume of statics (250-500TB / month). We implemented our "cdn" in the form of N-number of Nginx servers with data synchronization via Rsync. It is very important for us that during updates the content quickly scatters across the servers, so we implemented it as a master-slave, where the master is the central server where the data is uploaded, it works in passive mode: rsync runs on each slave by cron and takes the data.
The advantages of this scheme are as follows:
- if the master fails, then each server has a copy of all the data
- the master server knows nothing about the slaves, it's easy to add a new server to the group.
- to initiate a server update, you just need to upload new content (no additional active action is required)
The minus is the following:
Although, the amount of static is not large (5-6GB), but 10 servers, pulling the master server with rsync every minute, load it very well.
There is a desire to keep all the pluses and somehow deal with the minus, as I understand it, RSYNC is forced to scan files every time for each request and this is something that eats the CPU quite well, although 99% of the time the files do not change (updates occur 2-3 times a day ).
How can the schema be modified? What software to use?
Answer the question
In order to leave comments, you need to log in
The master-slave structure under load often implies that only write-update requests go to the master, and the master pushes changes to the slaves.
You can use the built-in nginx functionality - configure proxying with caching
. The request goes to the slave if it has content in the cache, then it sends, if not, then goes to the master, takes the content, caches and gives it to the client
. In this case, you do not need to copy anything, just add his name to the list of servers for static
It might be worth changing the distribution scheme from "slave pull from master" to "master push to slave" to reduce the number of scans for file changes (it will be done once, at the time the master data is updated), and also use rsync batch mode
serverfault.com/ questions/137119/rsync-to-multiple...
If the url changes during the update (if new content is only added and the old content is removed) or it is possible to change them (during the update), then you can try nginx + proxy_pass + nginx caching.
Well, or if the url does not change, the same thing, but resetting the cache of the updated files by the script according to the list.
give the command to download the entire file, not the diff (--whole-file), do not check checksums if there is a check key
strange, even if there are 50k files, rsync does not load my CPU when searching, and it works very quickly ... maybe the problem is in the slow disk subsystem on the server?
I just offer another option, and not a solution to your problem)
if there are a lot of files and you can’t get rid of the current scheme, then there is this option:
we make a special file on the server (let the data be stored in it in the form of a table) - it contains the number of the uploaded file and the path to it. for example:
...
9 /var/www/backup/2016-05-15.tar
10 /etc/postfix/main.cf
and another file that contains the last update number.
the client connects and checks its number with the number of the last update on the server, for example, on the client, the last uploaded file: 8.
8<10, so we download files 9 and 10 from the server, ask for their path (awk for help) and copy with the usual scp)
I think the idea is clear?
nginx + proxy_store ( nginx.org/ru/docs/http/ngx_http_proxy_module.html#... , so that the files on the slaves "appear" instantly, and pull rsync less often with a time difference between the slaves.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question