E
E
Elick2021-11-18 09:31:29
Python
Elick, 2021-11-18 09:31:29

Is it possible to drop duplicate rows in pandas, leaving the last few?

Hello, I'm wondering if it's possible to drop duplicate rows in pandas, but at the same time leaving a certain number of last rows?
The .drop_duplicates() function allows you to save only the first or last row.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
E
Elick, 2021-11-18
@Elick

So far, something like this comes to mind through the compression sheet, but if unique_vals_clmn1 is too large, it can take a long time
. And the code leaves the last 100 lines for each unique value of column1, i.e. it removes duplicates by column1 keeping the last 100 rows

unique_vals_clmn1 = df['column1'].unique().tolist()
df = pd.concat([df[df['column1']==unique_val_clmn1].tail(n=100) for unique_val_clmn1 in unique_vals_clmn1]).reset_index().sort_values('index', ascending=True).set_index('index')

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question