Is it possible to drop duplicate rows in pandas, leaving the last few?

E

Elick2021-11-18 09:31:29

Python

Elick, 2021-11-18 09:31:29

Hello, I'm wondering if it's possible to drop duplicate rows in pandas, but at the same time leaving a certain number of last rows?
The .drop_duplicates() function allows you to save only the first or last row.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

E

Elick, 2021-11-18
@Elick

So far, something like this comes to mind through the compression sheet, but if unique_vals_clmn1 is too large, it can take a long time
. And the code leaves the last 100 lines for each unique value of column1, i.e. it removes duplicates by column1 keeping the last 100 rows

unique_vals_clmn1 = df['column1'].unique().tolist()
df = pd.concat([df[df['column1']==unique_val_clmn1].tail(n=100) for unique_val_clmn1 in unique_vals_clmn1]).reset_index().sort_values('index', ascending=True).set_index('index')