M
M
mp3sh2016-01-20 23:38:54
Python
mp3sh, 2016-01-20 23:38:54

Merging/joining tables (python, pandas library)?

Please tell me how to properly merge two dataframes

In [1]: df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
   ...:                     'B': ['B0', 'B1', 'B2', 'B3'],
   ...:                     'C': ['C0', 'C1', 'C2', 'C3'],
   ...:                     'D': ['D0', 'D1', 'D2', 'D3']},
   ...:                     index=[0, 1, 2, 3])

In [2]: df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
   ...:                     'B': ['B4', 'B1', 'B2', 'B7'],
   ...:                     'C': ['C4', 'C1', 'C2', 'C7'],
   ...:                     'D': ['D4', 'D5', 'D6', 'D7']},
   ...:                     index=[0, 1, 2, 3])

It is necessary to merge so that the lines that have at least one identical element are replaced by lines in the first dataframe with lines from the second dataframe, and those for which no matches were found are simply added to the end of the first dataframe.
table 1 and table 2
aa445381eb1a4a15ac4fd4ce3538418e.JPGas a result, you need to get
c9dac5aeceff4561afdb3b2bf9313962.JPG

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
Sergey Paramonov, 2016-01-21
@varagian

You can try head-on "declaratively":
a) df_{a,b,c,d} = join df2, df1 over {a,b,c,d}
b) projection onto the required attributes from df2 in each df_{a,. ..}
c) take indexes that are not included in df_{a,b,c,d} and put entries in rest
d) concat(df_{a,b,c,d}, rest)
Or, more simply, imperatively:
take two loop over iterators (DataFrame.iterrows()) and go through both datasets.
What will be easier to implement and faster to work with? To be honest, it's not very obvious, and it may depend on the data - you need to try.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question