Answer the question
In order to leave comments, you need to log in
How to read multiple parquet files in Spark?
Hello comrades! Please help me deal with Spark .
There is a directory in which there are a bunch of parquet files. The names of these files have the same format: "DD-MM-YYYY" . For example: '01-10-2018' , '02-10-2018' , '03-10-2018' etc. As an input parameter, I get a start date ( dateFrom ) and an end date ( dateTo ). The value of these variables is dynamic.
If I use the following code, then the program is broadcast:
val mf = spark.read.parquet("/PATH_TO_THE_FOLDER/*").filter($"DATE".between(dateFrom + " 00:00:00", dateTo + " 23:59:59"))
mf.show()
*
it, it checks all the files in the directory and therefore the program is broadcast. val mf1 = spark.read.parquet("/PATH_TO_THE_FOLDER/01-10-2018");
val mf2 = spark.read.parquet("/PATH_TO_THE_FOLDER/02-10-2018");
val final = mf1.union(mf2).distinct();
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question