Answer the question
In order to leave comments, you need to log in
Is it possible to drop duplicate rows in pandas, leaving the last few?
Hello, I'm wondering if it's possible to drop duplicate rows in pandas, but at the same time leaving a certain number of last rows?
The .drop_duplicates() function allows you to save only the first or last row.
Answer the question
In order to leave comments, you need to log in
So far, something like this comes to mind through the compression sheet, but if unique_vals_clmn1 is too large, it can take a long time
. And the code leaves the last 100 lines for each unique value of column1, i.e. it removes duplicates by column1 keeping the last 100 rows
unique_vals_clmn1 = df['column1'].unique().tolist()
df = pd.concat([df[df['column1']==unique_val_clmn1].tail(n=100) for unique_val_clmn1 in unique_vals_clmn1]).reset_index().sort_values('index', ascending=True).set_index('index')
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question