How to process rdd?

D

denislysenko2021-12-07 19:52:42

Python

denislysenko, 2021-12-07 19:52:42

How to process rdd?

I have an rdd that looks like this:

my_rdd.take(10)

[['movieId', 'title', 'genres'],
 ['1', 'Toy Story (1995)', 'Adventure|Animation|Children|Comedy|Fantasy'],
 ['2', 'Jumanji (1995)', 'Adventure|Children|Fantasy'],
 ['3', 'Grumpier Old Men (1995)', 'Comedy|Romance'],
 ['4', 'Waiting to Exhale (1995)', 'Comedy|Drama|Romance'],
 ['5', 'Father of the Bride Part II (1995)', 'Comedy'],
 ['6', 'Heat (1995)', 'Action|Crime|Thriller'],
 ['7', 'Sabrina (1995)', 'Comedy|Romance'],
 ['8', 'Tom and Huck (1995)', 'Adventure|Children'],
 ['9', 'Sudden Death (1995)', 'Action']]

and i have variables by which i need to filter this my_rdd:

arg_genres = 'Action' # фильтр по жанру
year_to = 2012 # фильтр по году 
year_from = 2005 # фильтр по году 
regexp = 'Toy Story' # фильтр по названию фильма

but I don't understand how can I get a new rdd that will contain movies, those movies that satisfy the filtering variables that I specified above.
How can I get a new rdd with filtered movies?