How to organize duplication of Spark driver?

J

Jolt2021-09-14 01:36:43

Docker

Jolt, 2021-09-14 01:36:43

There is Kafka, where events fall, they need to be filtered and entered into different databases.
Right now everything is built on Apche Spark (PySpark).
Each docker container ran its own local SparkContext, with its own writeStream.foreachBatch.
But this is very memory-consuming, so now I moved everything into one container, in which writeStream is connected to one context.

In any case, I would like to have redundancy so that two containers are launched on different machines with the same task. How can this be done?
I assume that a separate Spark cluster with two masters and N masters is required.
And how to duplicate tasks (spark driver)?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

P

Prompt Attestation, 2021-09-20
@PromptAttestation

Can i know what are spark drivers?