How to organize parallel computing on Node and AWS?

A

andreys752021-07-13 18:25:15

Amazon Web Services

andreys75, 2021-07-13 18:25:15

Good afternoon.
How to organize parallel processing of large amounts of data in Node?

To be more specific, let's say that I have several dozen files of several GB each. The files contain an array of data. There is a task to count the general statistics on unique values.

1. We need to read each file
2 Calculate statistics for each file
3. Combine the results into one statistics

Now all this is done by sequential processing of each file on the client side. In a separate service worker (one). I would like to somehow organize this by parallel computing.
It is clear how this can be parallelized on the client side, but we carried out small tests and parallel (in different workers) reading files from the CacheAPI where we store large data arrays downloaded from the server give worse results than sequential reading in one worker.

Therefore, we are thinking about how to transfer these calculations to the server side.
What is the best use of the AWS arsenal?
We are thinking in the direction of AWS Lambda, but I did not find examples of such use.

On the server, the files are on S3

I would be grateful for any hints.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

V

Vadim, 2021-07-13
@Viji

why not use Lambda with boto3 python? There are more examples...
I had a long task in Lambda, I wrote an interim value in the sqs queue and got it during the Lambda run trace. You can also use step functions
https://aws.amazon.com/en/step-functions/?nc1=h_ls...

S

Stanislav Bodrov, 2021-07-13
@jenki

How to organize parallel processing of large amounts of data in Node?

This is more of a question for Node than for Amazon. As far as I understand, there seems to be parallel data processing, but it seems like there is no multithreading at the language level.
AWS Lambda supports js, try it.