Answer the question
In order to leave comments, you need to log in
What is the map function supposed to do in mapreduce?
The essence of the task that I am doing:
There is a movies.csv file (for 10,000 lines), by the way, if it matters, then you can download it in the same way: wget https://files.grouplens.org/datasets/movielens/ml- ...
the file looks like this:
movieId,title,genres
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children|Fantasy
3,Grumpier Old Men (1995), Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
6,Heat (1995),Action|Crime|Thriller
7,Sabrina (1995),Comedy |Romance
8,Tom and Huck (1995),Adventure|Children
9,Sudden Death (1995),Action
...
Now, using the mapreduce approach, I need a file called mapper.py to print keys and values to the console, where the key is the genre of the movie, and the value is the name of the movie and its year, while it should be possible to filter movies by name, year and genre by command line argument, and if we do not pass these arguments, then all movies should be displayed on the console.
Example:
there is this line:
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
as you can see, it has several genres, and mapper.py should print this movie in the console like this:
Adventure Toy Story;1995
Animation Toy Story;1995
Children Toy Story;1995
Comedy Toy Story;1995
Fantasy Toy Story;1995
and so on for all the movies
I run it this way: cat files/movies.csv | python3 mapper.py [arguments can be passed here to filter movies as desired]
my mapper.py file looks like this:
import sys
import argparse
import re
def argpars():
parser = argparse.ArgumentParser()
parser.add_argument('-genres',
type=str,
help='filter by genre',
default=''
)
parser.add_argument('-year_from',
type=int,
help='filter by year (FROM YEAR)',
default=1800
)
parser.add_argument('-year_to',
type=int,
help='filter by year (TO YEAR)',
default=2025
)
parser.add_argument('-regexp',
type=str,
help='filter on the movie name',
default=''
)
return parser.parse_args()
def print_result(data, year_from, year_to, name, genres_argument):
for line in data:
for key, value in map(line, year_from, year_to, name, genres_argument):
print(key, "\t", str(value))
def map(line, year_from, year_to, name, genres_argument):
list_line = line.split(",")
list_line[2] = list_line[2][:-2]
# filter by year and regexp
if filter_by_year(year_from, year_to, list_line[1]) and filter_by_regexp(name, list_line[1]):
# filter by genres
list_genres_argument = genres_argument.split('|')
genres = list_line[2].split('|')
for arg_genre in list_genres_argument:
for genre in genres:
if filter_by_genres(genre, arg_genre):
key = genre
value = '{};{}'.format(list_line[1][:-7], list_line[1][-5:-1])
yield key, value
def filter_by_year(year_from, year_to, string):
pattern = r'\(\d{4}\)'
if re.search(pattern, string):
year = re.search(pattern, string)
a = year.group(0)[1:-1]
int_year = int(a)
if year_from <= int_year <= year_to:
return True
def filter_by_regexp(name, string):
pattern = name
if re.search(pattern, string):
return True
def filter_by_genres(genre, genres_argument):
if genres_argument == '' and genre != '(no genres listed)':
return True
elif genre == genres_argument and genre != '(no genres listed)':
return True
if __name__ == "__main__":
args = argpars()
print_result(sys.stdin, args.year_from, args.year_to, args.regexp, args.genres)
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question