O
O
organica2020-07-16 13:38:37
Python
organica, 2020-07-16 13:38:37

How to implement a loop for processing data?

Dear experts, please help me solve a routine work task. Every month she torments me, I don’t have the strength to endure it anymore, and I can’t automate the calculation.
Briefly: there is an excel table that reflects the number of operations performed by an employee. The file is unloaded from the accounting system. When an employee works on the night shift (from 20:00 to 8:00), his operations are divided into two lines: from 20:00 to 24:00 and from 24:00 to 08:00. I regularly need to monitor the average performance of an employee, but these doubled lines spoil my whole life.

Initially I tried it through pandas, transferred excel to DF, then groupby().mean() and it would seem that the result came out. However, when rechecking the results, this bug got out: let's say an employee on the night shift did 2000 operations, but half was before midnight, and the second half was after midnight. When aggregating, it turns out that on average he made 1000 operations. Thus, the final productivity for the month is greatly underestimated. Hand picking a lengthy list of 90+ employees is a so-so idea.
What I want but can't get:
If an employee works 2 calendar days in a row, then the values ​​in the column with the amount of work should be summed up and written to a new DF in the format date-employee-number of operations. It happens that an employee goes on a part-time job and works 3 days in a row. In this case, the third repeated day should not be summed up, but should be taken into account as a separate work shift (after the third day, the employee will definitely not be repeated).

Grouped DF structure:

df1.groupby(['Personnel','Work_Date'])['Boxes_Picked_Picking'].sum().head(50)

Personnel         Work_Date 
Aleksey           2020-05-25     519
                  2020-05-26     223
                  2020-05-27     203
                  2020-05-29     265
                  2020-05-30     262
                  2020-05-31     510
                  2020-06-02     329
                  2020-06-03     486
                  2020-06-04     766
                  2020-06-07     343
                  2020-06-08     372
                  2020-06-10     563
                  2020-06-11     289
                  2020-06-12     547
                  2020-06-14     829
                  2020-06-19     290
                  2020-06-20     783
                  2020-06-22    1006
                  2020-06-23     467
                  2020-06-24     875
                  2020-06-25     571
                  2020-06-26      16
                  2020-06-27     292
                  2020-06-28     562
                  2020-07-01     378
                  2020-07-02     542
                  2020-07-04     832
                  2020-07-05     429
                  2020-07-06     599
                  2020-07-08      88
                  2020-07-09     328
                  2020-07-10     877
                  2020-07-12     916
                  2020-07-14     190


How can I solve my problem, in which direction should I think?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Andy_U, 2020-07-16
@Andy_U

You are obviously miscalculating the average.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question