D
D
denislysenko2021-12-21 20:14:16
Database
denislysenko, 2021-12-21 20:14:16

How to split one column into 2 columns in spark dataframe?

I have this dataframe:

splited_genres_df.show(15)

+-------+--------------------+---------+
|movieId|               title|   genres|
+-------+--------------------+---------+
|      1|    Toy Story (1995)|Adventure|
|      1|    Toy Story (1995)|Animation|
|      1|    Toy Story (1995)| Children|
|      1|    Toy Story (1995)|   Comedy|
|      1|    Toy Story (1995)|  Fantasy|
|      2|      Jumanji (1995)|Adventure|
|      2|      Jumanji (1995)| Children|
|      2|      Jumanji (1995)|  Fantasy|
|      3|Grumpier Old Men ...|   Comedy|
|      3|Grumpier Old Men ...|  Romance|
|      4|Waiting to Exhale...|   Comedy|
|      4|Waiting to Exhale...|    Drama|
|      4|Waiting to Exhale...|  Romance|
|      5|Father of the Bri...|   Comedy|
|      6|         Heat (1995)|   Action|
+-------+--------------------+---------+
only showing top 15 rows


in the title column I have the name of the film and the year of release of this film, how to split the title into two such columns:
title which will store the title of the film,
year which will store the year of release of this film

?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
C
Cheypnow, 2021-12-23
@denislysenko

df.withColumn("title_new", split(col("title"), " (").getItem(0))
   .withColumn("year", split(col("title"), " (").getItem(1))

Split here is a curve, but the principle should be clear.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question