Answer the question
In order to leave comments, you need to log in
How to remove rows from a table that are repeated in another table?
Good afternoon!
Problem essence: there is a big table and the small table. The small table contains individual rows from the large one.
"Large":
col1 col2 col3
0 A 1 5
1 B 2 6
2 C 3 7
3 D 4 8
4 C 3 102
"Small":
col1 col2 col3
0 C 3 7
How to remove rows from a large table that are repeated in both? (in this case it is the string "C 3 7").
It should look like this:
col1 col2 col3
0 A 1 5
1 B 2 6
2 D 4 8
3 C 3 102
It is desirable to do this without loops, since real tables contain hundreds of thousands of rows and many repeating values.
Thank you very much!
Answer the question
In order to leave comments, you need to log in
import pandas as pd
df1 = pd.DataFrame({'col1':['A', 'B', 'C', 'D', 'C'], 'col2': [1,2,3,4,3], 'col3': [5,6,7,8,102]})
df1
df2 = pd.DataFrame({'col1':['C'], 'col2': [3], 'col3': [7]})
df2
df_new = pd.merge(df1, df2, how='outer', indicator=True)
df_new.loc[df_new['_merge'] == 'left_only'].drop('_merge', axis=1)
convert dataframes into lists, more precisely into a two-dimensional array format, i.e. a list of lists, and make a compression sheet with a condition, then convert it back with the same column names as they were.
new_list=[x for x in df1.values.tolist() if x not in df2.values.tolist()]
df3=pandas.Dataframe.from_records(data=new_list, columns=df1.columns.values())
df3
Out[217]:
col1 col2 col3
0 A 1 5
1 B 2 6
2 D 4 8
3 C 3 102
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question