Pandas聚合運(yùn)算和分組運(yùn)算的實(shí)現(xiàn)示例
1.聚合運(yùn)算
(1)使用內(nèi)置的聚合運(yùn)算函數(shù)進(jìn)行計(jì)算
1>內(nèi)置的聚合運(yùn)算函數(shù)
sum(),mean(),max(),min(),size(),describe()...等等
2>應(yīng)用聚合運(yùn)算函數(shù)進(jìn)行計(jì)算
import numpy as np
import pandas as pd
#創(chuàng)建df對(duì)象
dict_data = {
'key1':['a','b','c','d','a','b','c','d'],
'key2':['one','two','three','one','two','three','one','two'],
'data1':np.random.randint(1,10,8),
'data2':np.random.randint(1,10,8)
}
df = pd.DataFrame(dict_data)
print(df)
'''
data1 data2 key1 key2
0 3 4 a one
1 7 9 b two
2 5 7 c three
3 3 4 d one
4 8 7 a two
5 4 7 b three
6 8 9 c one
7 4 4 d two
'''
#根據(jù)key1分組,進(jìn)行sum()運(yùn)算
df = df.groupby('key1').sum()
print(df)
'''
key1
a 12 10
b 8 5
c 8 11
d 16 13
'''
#內(nèi)置的聚合函數(shù)
print(df.groupby('key1').sum())
print('*'*50)
print(df.groupby('key1').max())
print('*'*50)
print(df.groupby('key1').min())
print('*'*50)
print(df.groupby('key1').mean())
print('*'*50)
print(df.groupby('key1').size())
print('*'*50)
#分組中非Nan數(shù)據(jù)的數(shù)量
print(df.groupby('key1').count())
print('*'*50)
print(df.groupby('key1').describe())
(2)自定義聚合函數(shù)進(jìn)行計(jì)算
在使用自定義聚合函數(shù)的時(shí)候,需要用到一個(gè)agg()函數(shù)
#自定義聚合函數(shù)
#最大值-最小值
def peak_range(df):
#返回?cái)?shù)據(jù)范圍差值
return df.max()**2 - df.min()**2
#agg() 可以將聚合計(jì)算的結(jié)果祖闖成一個(gè)dataframe對(duì)象返回
print(df.groupby('key1').agg(peak_range))
#lambda
print(df.groupby('key1').agg(lambda df:df.max()-df.min()))
(3)應(yīng)用多個(gè)聚合函數(shù),默認(rèn)列索引為函數(shù)名
#應(yīng)用多個(gè)聚合函數(shù),默認(rèn)列索引為函數(shù)名
#通過元素重新命名列索引('列索引',函數(shù))
print(df.groupby('key1').agg(['sum','std','mean',('range',peak_range)]))
'''
data1 data2
sum std mean range sum std mean range
key1
a 10 2.828427 5.0 40 12 2.828427 6.0 48
b 10 5.656854 5.0 80 8 1.414214 4.0 16
c 6 1.414214 3.0 12 9 0.707107 4.5 9
d 15 0.707107 7.5 15 8 2.828427 4.0 32
'''
(4)指定每一列使用某個(gè)聚合運(yùn)算函數(shù)
#指定每一列使用某個(gè)聚合運(yùn)算函數(shù)
print(df.groupby('key1').agg({'data1':'mean','data2':'sum'}))
'''
data1 data2
key1
a 5.0 12
b 5.0 8
c 3.0 9
d 7.5 8
'''
2.分組運(yùn)算
(1)進(jìn)行分組運(yùn)算,并在運(yùn)算后的結(jié)果列索引前加前綴
加前綴用到add_prefix('前綴')函數(shù)
#創(chuàng)建df對(duì)象
dict_data = {
'key1':['a','b','c','d','a','b','c','d'],
'key2':['one','two','three','one','two','three','one','two'],
'data1':np.random.randint(1,10,8),
'data2':np.random.randint(1,10,8)
}
df = pd.DataFrame(dict_data)
print(df)
'''
data1 data2 key1 key2
0 1 5 a one
1 9 3 b two
2 3 6 c three
3 6 9 d one
4 8 4 a two
5 5 5 b three
6 9 6 c one
7 4 1 d two
'''
#按照key1分組,進(jìn)行sum()運(yùn)算
#在運(yùn)算結(jié)果的列索引前添加前綴
k1_sum = df.groupby('key1').sum().add_prefix('sum_')
print(k1_sum)
'''
sum_data1 sum_data2
key1
a 9 9
b 14 8
c 12 12
d 10 10
'''
(2)進(jìn)行分組運(yùn)算,并把原始數(shù)據(jù)和結(jié)果數(shù)據(jù)合并
#創(chuàng)建df對(duì)象
dict_data = {
'key1':['a','b','c','d','a','b','c','d'],
'key2':['one','two','three','one','two','three','one','two'],
'data1':np.random.randint(1,10,8),
'data2':np.random.randint(1,10,8)
}
df = pd.DataFrame(dict_data)
print(df)
'''
data1 data2 key1 key2
0 1 5 a one
1 9 3 b two
2 3 6 c three
3 6 9 d one
4 8 4 a two
5 5 5 b three
6 9 6 c one
7 4 1 d two
'''
#按照key1分組,進(jìn)行sum()運(yùn)算
#在運(yùn)算結(jié)果的列索引前添加前綴
k1_sum = df.groupby('key1').sum().add_prefix('sum_')
print(k1_sum)
'''
sum_data1 sum_data2
key1
a 9 9
b 14 8
c 12 12
d 10 10
'''
#將運(yùn)算結(jié)果和原始數(shù)據(jù)拼接到一起
#參數(shù)1:原始數(shù)據(jù)
#參數(shù)2:運(yùn)算結(jié)果數(shù)據(jù)
pd.merge(df,k1_sum,left_on='key1',right_index=True)
(3)使用transform()函數(shù),將計(jì)算結(jié)果按照原始數(shù)據(jù)排序成一個(gè)DataFrame對(duì)象
#創(chuàng)建df對(duì)象
dict_data = {
'key1':['a','b','c','d','a','b','c','d'],
'key2':['one','two','three','one','two','three','one','two'],
'data1':np.random.randint(1,10,8),
'data2':np.random.randint(1,10,8)
}
df = pd.DataFrame(dict_data)
print(df)
'''
data1 data2 key1 key2
0 1 5 a one
1 9 3 b two
2 3 6 c three
3 6 9 d one
4 8 4 a two
5 5 5 b three
6 9 6 c one
7 4 1 d two
'''
#按照key1分組,進(jìn)行sum()運(yùn)算
#在運(yùn)算結(jié)果的列索引前添加前綴
k1_sum = df.groupby('key1').sum().add_prefix('sum_')
print(k1_sum)
'''
sum_data1 sum_data2
key1
a 9 9
b 14 8
c 12 12
d 10 10
'''
#transform() 計(jì)算 會(huì)將計(jì)算的結(jié)果按照原始數(shù)據(jù)的排序組裝成一個(gè)dataframe對(duì)象
k1_sum_tf = df.groupby('key1').transform(np.sum).add_prefix('sum_')
# print(k1_sum_tf.columns)
#把運(yùn)算結(jié)果數(shù)據(jù)拼接到原始數(shù)據(jù)后
df[k1_sum_tf.columns] = k1_sum_tf
print(df)
'''
data1 data2 key1 key2 sum_data1 sum_data2 sum_key2
0 5 4 a one 9 12 onetwo
1 3 3 b two 5 12 twothree
2 9 2 c three 14 9 threeone
3 6 5 d one 11 9 onetwo
4 4 8 a two 9 12 onetwo
5 2 9 b three 5 12 twothree
6 5 7 c one 14 9 threeone
7 5 4 d two 11 9 onetwo
'''
以上就是本文的全部內(nèi)容,希望對(duì)大家的學(xué)習(xí)有所幫助,也希望大家多多支持腳本之家。
相關(guān)文章
詳解Python中最常用的10個(gè)內(nèi)置函數(shù)
Python作為一種多用途編程語言,擁有豐富的內(nèi)置函數(shù)庫,這些函數(shù)可以極大地提高開發(fā)效率,本文將介紹Python中最常用的10個(gè)內(nèi)置函數(shù),我們將深入了解每個(gè)函數(shù),并提供示例代碼以幫助您更好地理解它們,需要的朋友可以參考下2023-11-11
python實(shí)現(xiàn)RabbitMQ的消息隊(duì)列的示例代碼
這篇文章主要介紹了python實(shí)現(xiàn)RabbitMQ的消息隊(duì)列的示例代碼,總結(jié)了RabbitMQ中三種exchange模式的實(shí)現(xiàn),分別是fanout, direct和topic。感興趣的小伙伴們可以參考一下2018-11-11
Python之a(chǎn)scii轉(zhuǎn)中文的實(shí)現(xiàn)
這篇文章主要介紹了Python之a(chǎn)scii轉(zhuǎn)中文的實(shí)現(xiàn)方式,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2021-05-05
Python和Anaconda和Pycharm安裝教程圖文詳解
PyCharm是一種PythonIDE,帶有一整套可以幫助用戶在使用Python語言開發(fā)時(shí)提高其效率的工具,這篇文章主要介紹了Python和Anaconda和Pycharm安裝教程,需要的朋友可以參考下2020-02-02
python 遠(yuǎn)程統(tǒng)計(jì)文件代碼分享
享一個(gè)Python獲取遠(yuǎn)程文件大小的函數(shù)代碼,簡單實(shí)用,是學(xué)習(xí)Python編程的基礎(chǔ)實(shí)例。2015-05-05

