How to split data in Pandas row?

J

Joseph Goodman2020-08-01 08:48:26

Python

Joseph Goodman, 2020-08-01 08:48:26

Good afternoon, I'm trying to comprehend the Pandas library and I wanted to conduct a study on the number and features of films of different genres in one dataset.

The dataframe has a column with genres separated by this | character.
I want to transform my data so that in my dataframe each row has only one genre and let the films repeat, then I will simply conveniently group the data by the name of the film.
The use of the split function is just spinning in my head, but I can’t continue the idea further

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

D

Dmitry, 2020-08-01
@lolaevv

>>> import pandas as pd
>>> df = pd.DataFrame(, columns=['title', 'genre'])
>>> df
  title             genre
0   123      Anime|Action
1   321  Adventure|Comedy
>>> df['genre'] = df['genre'].apply(lambda x: x.split('|'))
>>> df
  title                genre
0   123      [Anime, Action]
1   321  [Adventure, Comedy]
>>> df.explode('genre')
  title      genre
0   123      Anime
0   123     Action
1   321  Adventure
1   321     Comedy

pandas.DataFrame.explode

P

PavelMos, 2020-08-01
@PavelMos

IMHO it is more expedient to make attributes like "is it a comedy? yes / no (1/0)", to do this, enter additional. columns. If you duplicate rows, then the size of the dataframe will also increase greatly.

import pandas
df1=pandas.DataFrame.from_records((
    (1, 'xxx', 'Adv|Ani|Doc'),
    (2, 'yyy', 'Adv|Doc'),
    (3, 'zzz', 'Comedy|Doc')),
columns=['movieId','title','genres'])
genres_list=('Adv','Ani','Doc','Comedy')
for i in genres_list:
    df1[i]=[0]*len(df1) #сначала прописать всем нули
print (df1)
for idx, row in df1.iterrows():
    c=(row[2])
    l=c.split('|')
    for g in genres_list:
        if g in l:
            df1.loc[idx, g]=1
print (df1)
   movieId title       genres  Adv  Ani  Doc  Comedy
0        1   xxx  Adv|Ani|Doc    0    0    0       0
1        2   yyy      Adv|Doc    0    0    0       0
2        3   zzz   Comedy|Doc    0    0    0       0
   movieId title       genres  Adv  Ani  Doc  Comedy
0        1   xxx  Adv|Ani|Doc    1    1    1       0
1        2   yyy      Adv|Doc    1    0    1       0
2        3   zzz   Comedy|Doc    0    0    1       1