D
D
denislysenko2021-10-04 17:01:04
Python
denislysenko, 2021-10-04 17:01:04

How to make one dataset from two csv files?

there are two files:
movies.csv and rating.csv

movies.csv looks like this:
movieId,title,genres
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children| Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
6,Heat (1995),Action|Crime |Thriller
7,Sabrina (1995),Comedy|Romance
...

rating.csv looks like this:
userId,movieId,rating,timestamp
1,1,4.0,964982703
1,3,4.0,964981247
2,6,4.0,964982224
2 .47.5.0.964983815
2.50.5.0.964982931
3.70.3.0.964982400
3,101,5.0,964980868
3,110,4.0,964982176
3,333,4.0,1445715029
...

How to make one dataset, from these two files, which will store all the movies.csv fields and will have one more field - this is AVERAGE FILM RATING. That is, how to add an average rating to each movie if there are several ratings for each movie in the rating.csv file (how to get the average rating for each movie using only python)?

data_movies = []

with open('files/movies.csv', encoding='utf-8') as file:
    reader = csv.reader(file, delimiter=',')
    for row in reader:
        data_movies.append(row)
        
data_rating = []
with open('files/ratings.csv', encoding='utf-8') as file:
    reader = csv.reader(file, delimiter=',')
    for row in reader:
        data_rating.append(row)


now I have two tables in the form of two-dimensional arrays: data_movies and data_rating

Now I need to make the final result_data table in which there will be all the fields of the data_movies table and one more field (average movie rating).
How do I write a condition that will give me the average rating for each movie?

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question