How to organize file synchronization for your own CDN?

yiicoder2016-05-16 11:49:40

linux

yiicoder, 2016-05-16 11:49:40

It was required to distribute the average volume of statics (250-500TB / month). We implemented our "cdn" in the form of N-number of Nginx servers with data synchronization via Rsync. It is very important for us that during updates the content quickly scatters across the servers, so we implemented it as a master-slave, where the master is the central server where the data is uploaded, it works in passive mode: rsync runs on each slave by cron and takes the data.
The advantages of this scheme are as follows:
- if the master fails, then each server has a copy of all the data
- the master server knows nothing about the slaves, it's easy to add a new server to the group.
- to initiate a server update, you just need to upload new content (no additional active action is required)
The minus is the following:
Although, the amount of static is not large (5-6GB), but 10 servers, pulling the master server with rsync every minute, load it very well.
There is a desire to keep all the pluses and somehow deal with the minus, as I understand it, RSYNC is forced to scan files every time for each request and this is something that eats the CPU quite well, although 99% of the time the files do not change (updates occur 2-3 times a day ).
How can the schema be modified? What software to use?

Answer the question

In order to leave comments, you need to log in

7 answer(s)

sim3x, 2016-05-16
@sim3x

The master-slave structure under load often implies that only write-update requests go to the master, and the master pushes changes to the slaves.
You can use the built-in nginx functionality - configure proxying with caching
. The request goes to the slave if it has content in the cache, then it sends, if not, then goes to the master, takes the content, caches and gives it to the client
. In this case, you do not need to copy anything, just add his name to the list of servers for static

Dmitry Shitskov, 2016-05-16
@Zarom

It might be worth changing the distribution scheme from "slave pull from master" to "master push to slave" to reduce the number of scans for file changes (it will be done once, at the time the master data is updated), and also use rsync batch mode
serverfault.com/ questions/137119/rsync-to-multiple...

Alexey, 2016-05-16
@alsopub

If the url changes during the update (if new content is only added and the old content is removed) or it is possible to change them (during the update), then you can try nginx + proxy_pass + nginx caching.
Well, or if the url does not change, the same thing, but resetting the cache of the updated files by the script according to the list.

alegzz, 2016-05-16
@alegzz

give the command to download the entire file, not the diff (--whole-file), do not check checksums if there is a check key

Andrey Mikhalev, 2016-05-16
@Endru9

strange, even if there are 50k files, rsync does not load my CPU when searching, and it works very quickly ... maybe the problem is in the slow disk subsystem on the server?
I just offer another option, and not a solution to your problem)
if there are a lot of files and you can’t get rid of the current scheme, then there is this option:
we make a special file on the server (let the data be stored in it in the form of a table) - it contains the number of the uploaded file and the path to it. for example:
...
9 /var/www/backup/2016-05-15.tar
10 /etc/postfix/main.cf
and another file that contains the last update number.
the client connects and checks its number with the number of the last update on the server, for example, on the client, the last uploaded file: 8.
8<10, so we download files 9 and 10 from the server, ask for their path (awk for help) and copy with the usual scp)
I think the idea is clear?

Vlad Zhivotnev, 2016-05-16
@inkvizitor68sl

nginx + proxy_store ( nginx.org/ru/docs/http/ngx_http_proxy_module.html#... , so that the files on the slaves "appear" instantly, and pull rsync less often with a time difference between the slaves.

Evgeny Bezymyannikov, 2016-05-16
@psman

https://syncthing.net/