Python Pandas how to calculate mean with counting repetitions at the same time?

G

GreenX52020-09-10 01:29:26

Python

GreenX5, 2020-09-10 01:29:26

I'm trying to group data with averaging, but I'm missing a column. How to save it?
And how to add a column with number of occurrence count?

import pandas as pd

df = pd.DataFrame([['zet', 'z', '40%'],['Iks', 'x', '10%'],['Igrek', 'y', '5%'],['Iks', 'x', '20%']] , columns=['Name', 'Symbol', 'Value'])
print(df)
df = df.replace('%','',regex=True)
df['Value'] = pd.to_numeric(df['Value'])
df1 = df.groupby(['Symbol']).mean().sort_values(by=['Value'], ascending=False)
print(df1)

Reply

Answer the question

In order to leave comments, you need to log in

[[+comments_count]] answer(s)

P

PavelMos, 2020-09-10
@GreenX5

So after all, in this program, a new column in the dataframe is not created. To create, you need to give it a name: Next, the average of what? If from Value then it will be 75/4=18.75 i.e. it will be written in each line in a new column? Then just
df['newcol']=...какие-то действия...
The UPD column is missing because pandas probably only takes the specified Symbol column in the result and applies an aggregate function to all other digits, in this case mean(). It would be pointless to leave other text columns in the result set. some of the lines from them will simply disappear when grouped by Symbol.
Regarding adding two columns at once, you can look at "pandas add multiple columns". It is possible for example like this

df['a'], df['b']=list1,list2 #оба списка по длине равны длине колонки

but taking into account the fact that in the same line you need to calculate count() and write the corresponding value to the lines with each corresponding Value, you get a rather cumbersome construction.

df.groupby(['Name']).size()
Out[95]: 
Name
Igrek    1
Iks      2
zet      1
dtype: int64

Then create a correspondence element / number of occurrences, and according to this correspondence, write numbers in a new column for each line

df.groupby(['Name']).size().index.tolist()
df.groupby(['Name']).size().tolist() 
d=dict( zip (df.groupby(['Name']).size().index.tolist(), df.groupby(['Name']).size().tolist() ) )
d
Out[98]: {'Igrek': 1, 'Iks': 2, 'zet': 1}