Answer the question
In order to leave comments, you need to log in
Docker Swarm Replica/Global Mode + NFS or GlusterFS, how to make friends?
Lord, please help!
I am building a container orchestration system on Docker Swarm. Given: Ext.IP - haProxy - Docker Swarm (1 master, 5 nodes). I start a service, for example jenkins, in global (or replica 1 + n) mode, the configuration storage is the mounted GlusterFS-Server disk installed on all 6 swarm machines, I connect the storage via bind (I say in advance that the connection is through the created docker volume linked to the ball , does not solve the problem).
After the launch, the site does not work correctly, it does not always load pages, the cache is reset, the username and password are constantly requested. It is worth starting the service in the replica = 1 mode, or even as a separate container, so the problems disappear, everything works fine.
At first, the storage was built on the basis of nfs. A separate server with a fast disk was raised, the device was shared, mounted to all nodes. When I saw the problem, I thought about blocking files by reading / writing jenkins'a backend, compared it with the haProxy setting, it was balanced in roundrobin mode. The first thing I did was to disable the direction of flows to all nodes and leave traffic redirection only to the master - the situation did not improve, then I changed the balancing mode to leastconn so that haProxy would not switch the flow in order, but "attach" to the last successfully working web server - the situation has not improved. As soon as I turn on replication or global mode, then page reshuffling and unstable frontend work begin.
Further, after reading quite a lot on the forums, I decided to rebuild the storage on GlusterFS, since it is recommended for use in the Docker Swarm system, it allows you to synchronize changes on the fly and maintain failsafe balls. However, the problem has not been resolved.
Question: How to correctly implement the storage and use of configuration files, database files, logs of Swarm services in replica mode, or global mode? How to eliminate file lock conflicts? How to correctly build FailOver service replication? If you leave replica = 1, the problem is solved even if the node crashes, Swarm, after a while, will successfully start the service on another node, but I would like to see the advantages of the system in zero downtime. Thanks in advance, I'll be happy to answer any additional questions!
Answer the question
In order to leave comments, you need to log in
Perhaps I'm wrong, since I have no real experience in implementing the scheme you specified, but in the case of Jenkins, you first need to make sure that it supports this mode of operation. The main problem of working in a cluster is the lack of native support))
PS I will be glad to make a mistake and wait for an answer)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question