J
J
Jolt2021-09-14 01:36:43
Docker
Jolt, 2021-09-14 01:36:43

How to organize duplication of Spark driver?

There is Kafka, where events fall, they need to be filtered and entered into different databases.
Right now everything is built on Apche Spark (PySpark).
Each docker container ran its own local SparkContext, with its own writeStream.foreachBatch.
But this is very memory-consuming, so now I moved everything into one container, in which writeStream is connected to one context.

In any case, I would like to have redundancy so that two containers are launched on different machines with the same task. How can this be done?
I assume that a separate Spark cluster with two masters and N masters is required.
And how to duplicate tasks (spark driver)?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
P
Prompt Attestation, 2021-09-20
@PromptAttestation

Can i know what are spark drivers?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question