K
K
Kirill Petrov2020-05-20 21:07:49
Python
Kirill Petrov, 2020-05-20 21:07:49

How to optimize (vectorize) a function in pandas referring to the previous row in the dataframe?

Greetings! I ask for help in optimizing the code. It is necessary to place the difference with the previous value in a separate column. But if the value in the adjacent column differs from the value above it, then consider it as a new value. Here is an example:

import pandas as pd
import numpy as np

data = {
    'ind': [1,1,1,1,2,2,2,3,3,3],
    'num': [3,6,8,10,2,5,8,3,4,12]
    }
df = pd.DataFrame(data).assign(numResult=None)
df

Gives this picture:
-	ind	num	numResult
0	1	3	None
1	1	6	None
2	1	8	None
3	1	10	None
4	2	2	None
5	2	5	None
6	2	8	None
7	3	3	None
8	3	4	None
9	3	12	None


By iterating over the lines, I get the desired result:
beforeItem = None
result = []
for index, item in df.iterrows():
  if beforeItem is None:
    item.numResult = item.num
  elif item.ind != beforeItem.ind:
    item.numResult = item.num
  else:
    item.numResult = item.num - beforeItem.num

  result.append(item)
  beforeItem = item


df2 = pd.DataFrame(result)
df2

-	ind	num	numResult
0	1	3	3
1	1	6	3
2	1	8	2
3	1	10	2
4	2	2	2
5	2	5	3
6	2	8	3
7	3	3	3
8	3	4	1
9	3	12	8


And how to vectorize this function through a method np.vectorize, for example, because through iterrows and enumeration works slowly with large data ... Or maybe there is another method?

For convenience, I posted the code here: https://colab.research.google.com/drive/1pnIgoLA7j...

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dmitry, 2020-05-20
@Recosh

>>> import pandas as pd
>>> data = {'ind': [1,1,1,1,2,2,2,3,3,3], 'num': [3,6,8,10,2,5,8,3,4,12]}
>>> df = pd.DataFrame(data)
>>> df['numResult'] = (df['num'] - df['num'].shift(1)).where(df['ind']==df['ind'].shift(1), other=df['num'])

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question