How is data consistent across processes in a Node.js cluster?

I

iDisconnect2017-04-23 01:15:55

Node.js

iDisconnect, 2017-04-23 01:15:55

Gentlemen, I'm new to the world of traditional web backend development and Node.js in particular.
I usually develop software in C / C ++, where we often have multi-threaded applications in which all objects are located in a common memory for all threads (and primitives such as mutex, spinlock are used to synchronize thread access to objects).
However, the Node.js cluster is not multi-threaded, but multi-process - because of this, I have a lot of theoretical questions. Gentlemen, please tell me what are the traditional approaches when developing software for a Node.js cluster regarding ensuring the consistency of information between cluster processes? What should I consider when developing a backend application for a Node.js cluster?
For example, we have several processes in a Node.js cluster, each of the processes has cached a secret code to authenticate some user's HTTP requests through cookies (selecting this code from the database). At some point, one of the cluster processes decided to update this secret code. How should the new secret code get to all other processes? After all, when the user's next HTTP requests go through other processes, the user will be denied access.
Second example, we have multiple processes in a Node.js cluster serving an online clothing store. At the request of the user, one of the processes selected from the database a list of 500 clothing items that matched the user's request. This process gave 100 positions to the user as the first page of the response, it cached the rest of the positions (so as not to go to the database again when the user requests the next pages). But user requests for the next pages can go through other processes where this cached data does not exist - what to do with this? Should the cached list of 500 positions be duplicated for other processes? If so, how? Or does the user somehow have to be tied exclusively to the first process? If so, how?
In addition, are my examples correct, is the data caching described practiced in Node.js backend applications?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

P

Philipp, 2017-04-23
@iDisconnect

Clustered Node.js uses a shared storage approach instead of shared memory.
Typically, this role is performed by the Redis cluster, since it has notification mechanisms (subscriptions) and the ability to asynchronously notify cluster nodes.
Things like sessions and client-specific data (user caches) are stored outside the Node.Js process, for example in Redis. This increases infrastructure overhead, but allows seamless restarts and surviving sudden shutdowns of machines in the cluster.
Standard practice is to use a balancer, such as nginx included as a reverse-proxy. If you don't really want to deal with sessions, use ip_hash , it will take the headache away.
About specific questions.
A traditional cluster is a set of machines running many processes via https://nodejs.org/api/cluster.html
Of course, instead of shared memory, you get a shared service. For example, you work with the cache as with the database, etc. Read about how horizontal scaling works.
Consider the fact that this is a new area of expertise for you. In addition, very popular and very overheated opinions. Think of Node.js as the glue between services and other solutions. For example, image resizing is best done in C++, so the cost of creating a resizing process is less than the cost of resizing implemented in Node.js.
Usually such problems are solved through a configuration provider, such as a configuration server. Any attempt to change the configuration causes changes in the entire cluster. In the Cluster module, this is implemented through the notification mechanism. In a large cluster, such things are implemented through subscriptions in Redis .
See about the balancer and ip_hash. But in general, the approach is pretty bad. Request only the 100 items you need. If pulling data out of the database is a problem, change the database or its structure. Scale storage. As a last resort, use a shared cache (Redis, memcached).
Considered bad practice. Caching of configuration or data used everywhere (eg localization) is the norm. Caching user data easily leads to leaks. you never know how many users can access your resource in a given period of time.

I

index0h, 2017-04-23
@index0h

Gentlemen, please tell me what are the traditional approaches when developing software for a Node.js cluster regarding ensuring the consistency of information between cluster processes?

Within the framework of one instance - actually the same as in c ++ - shared memory. We create in a certain common osprey a service with data that is needed for everyone and use it.
Within the cluster - caching servers, such as memcached, redis.
Several factors:
- it flows, and you will wipe drops of blood from your eyes more than once, trying to figure out where.
- the quality of existing solutions in most cases is incredibly low. What is worth https://habrahabr.ru/post/280099/
- you will have a lot of dependencies whether you like it or not.
- if the event model does not help you - Node.js is probably not what you need.
Cache in scope available to each of your processes.
By the way it works inside - certainly. Visually, it will be somewhat different.

const http = require('http');

const hostname = '127.0.0.1';
const port = 3000;

const I_AM_CACHE = {
    "some": "data",
};

const server = http.createServer((req, res) => {
  res.statusCode = 200;
  res.setHeader('Content-Type', 'text/plain');
  res.end(JSON.stringify(I_AM_CACHE));
});

server.listen(port, hostname, () => {
  console.log(`Server running at http://${hostname}:${port}/`);
});

D

dummyman, 2017-04-23
@dummyman

However, the Node.js cluster is not multi-threaded, but multi-process - because of this, I have a lot of theoretical questions. Gentlemen, please tell me what are the traditional approaches when developing software for a Node.js cluster regarding ensuring the consistency of information between cluster processes? What should I consider when developing a backend application for a Node.js cluster?
Everything is like in nix. If you know the pros, you should understand. How to pass data between processes? Well, the fastest is certainly stdin / stdout, then sockets, databases are not particularly used, mainly only in cases to prevent race conditions .
If you worked with the pluses, then you will not have problems with memory leaks. Just approach the memory consumption model in the same way as in the pros. Again, don't change the structure of the class after the object has been created. - And you will have everything.
I noticed a trend, only PHP programmers use node scripts.
As for the clothing store, I would not use the database there. As if, xs, my personal subjective opinion can, but I would do it without a database. Well, it depends on how many employees are connected to this store. If there are only 4 managers and 1 director, kanesh database is just an extra hassle. SQL is much less agile than working with arrays / objects in js, for which I really love it.