Answer the question
In order to leave comments, you need to log in
How to optimize (vectorize) a function in pandas referring to the previous row in the dataframe?
Greetings! I ask for help in optimizing the code. It is necessary to place the difference with the previous value in a separate column. But if the value in the adjacent column differs from the value above it, then consider it as a new value. Here is an example:
import pandas as pd
import numpy as np
data = {
'ind': [1,1,1,1,2,2,2,3,3,3],
'num': [3,6,8,10,2,5,8,3,4,12]
}
df = pd.DataFrame(data).assign(numResult=None)
df
- ind num numResult
0 1 3 None
1 1 6 None
2 1 8 None
3 1 10 None
4 2 2 None
5 2 5 None
6 2 8 None
7 3 3 None
8 3 4 None
9 3 12 None
beforeItem = None
result = []
for index, item in df.iterrows():
if beforeItem is None:
item.numResult = item.num
elif item.ind != beforeItem.ind:
item.numResult = item.num
else:
item.numResult = item.num - beforeItem.num
result.append(item)
beforeItem = item
df2 = pd.DataFrame(result)
df2
- ind num numResult
0 1 3 3
1 1 6 3
2 1 8 2
3 1 10 2
4 2 2 2
5 2 5 3
6 2 8 3
7 3 3 3
8 3 4 1
9 3 12 8
np.vectorize
, for example, because through iterrows and enumeration works slowly with large data ... Or maybe there is another method? Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question