part0035

Python Pandas – Aggregations

Once the rolling, expanding and ewm objects are created, several methods are available to perform aggregations on data.

Applying Aggregations on DataFrame:

Let us create a DataFrame and apply aggregations on it.

import pandas as pd

import numpy as np

df = pd. DataFrame ( np. random. randn( 10 , 4 ),

index = pd. date_range( '1/1/2000' , periods= 10 ),

columns = [ 'A' , 'B' , 'C' , 'D' ])

print df

r = df. rolling( window= 3 , min_periods= 1 )

print r

Its output is as follows −

A B C D

2000-01-01 1.088512 -0.650942 -2.547450 -0.566858

2000-01-02 0.790670 -0.387854 -0.668132 0.267283

2000-01-03 -0.575523 -0.965025 0.060427 -2.179780

2000-01-04 1.669653 1.211759 -0.254695 1.429166

2000-01-05 0.100568 -0.236184 0.491646 -0.466081

2000-01-06 0.155172 0.992975 -1.205134 0.320958

2000-01-07 0.309468 -0.724053 -1.412446 0.627919

2000-01-08 0.099489 -1.028040 0.163206 -1.274331

2000-01-09 1.639500 -0.068443 0.714008 -0.565969

2000-01-10 0.326761 1.479841 0.664282 -1.361169

Rolling [window=3,min_periods=1,center=False,axis=0]

We can aggregate by passing a function to the entire DataFrame, or select a column via the standard get item method.

Apply Aggregation on a Whole Dataframe:

import pandas as pd

import numpy as np

df = pd. DataFrame ( np. random. randn( 10 , 4 ),

index = pd. date_range( '1/1/2000' , periods= 10 ),

columns = [ 'A' , 'B' , 'C' , 'D' ])

print df

r = df. rolling( window= 3 , min_periods= 1 )

print r. aggregate( np. sum)

Its output is as follows −

A B C D

2000-01-01 1.088512 -0.650942 -2.547450 -0.566858

2000-01-02 1.879182 -1.038796 -3.215581 -0.299575

2000-01-03 1.303660 -2.003821 -3.155154 -2.479355

2000-01-04 1.884801 -0.141119 -0.862400 -0.483331

2000-01-05 1.194699 0.010551 0.297378 -1.216695

2000-01-06 1.925393 1.968551 -0.968183 1.284044

2000-01-07 0.565208 0.032738 -2.125934 0.482797

2000-01-08 0.564129 -0.759118 -2.454374 -0.325454

2000-01-09 2.048458 -1.820537 -0.535232 -1.212381

2000-01-10 2.065750 0.383357 1.541496 -3.201469

A B C D

2000-01-01 1.088512 -0.650942 -2.547450 -0.566858

2000-01-02 1.879182 -1.038796 -3.215581 -0.299575

2000-01-03 1.303660 -2.003821 -3.155154 -2.479355

2000-01-04 1.884801 -0.141119 -0.862400 -0.483331

2000-01-05 1.194699 0.010551 0.297378 -1.216695

2000-01-06 1.925393 1.968551 -0.968183 1.284044

2000-01-07 0.565208 0.032738 -2.125934 0.482797

2000-01-08 0.564129 -0.759118 -2.454374 -0.325454

2000-01-09 2.048458 -1.820537 -0.535232 -1.212381

2000-01-10 2.065750 0.383357 1.541496 -3.201469

Apply Aggregation on a Single Column of a Dataframe:

import pandas as pd

import numpy as np

df = pd. DataFrame ( np. random. randn( 10 , 4 ),

index = pd. date_range( '1/1/2000' , periods= 10 ),

columns = [ 'A' , 'B' , 'C' , 'D' ])

print df

r = df. rolling( window= 3 , min_periods= 1 )

print r[ 'A' ]. aggregate( np. sum)

Its output is as follows −

A B C D

2000-01-01 1.088512 -0.650942 -2.547450 -0.566858

2000-01-02 1.879182 -1.038796 -3.215581 -0.299575

2000-01-03 1.303660 -2.003821 -3.155154 -2.479355

2000-01-04 1.884801 -0.141119 -0.862400 -0.483331

2000-01-05 1.194699 0.010551 0.297378 -1.216695

2000-01-06 1.925393 1.968551 -0.968183 1.284044

2000-01-07 0.565208 0.032738 -2.125934 0.482797

2000-01-08 0.564129 -0.759118 -2.454374 -0.325454

2000-01-09 2.048458 -1.820537 -0.535232 -1.212381

2000-01-10 2.065750 0.383357 1.541496 -3.201469

2000-01-01 1.088512

2000-01-02 1.879182

2000-01-03 1.303660

2000-01-04 1.884801

2000-01-05 1.194699

2000-01-06 1.925393

2000-01-07 0.565208

2000-01-08 0.564129

2000-01-09 2.048458

2000-01-10 2.065750

Freq: D, Name: A, dtype: float64

Apply Aggregation on Multiple Columns of a DataFrame:

import pandas as pd

import numpy as np

df = pd. DataFrame ( np. random. randn( 10 , 4 ),

index = pd. date_range( '1/1/2000' , periods= 10 ),

columns = [ 'A' , 'B' , 'C' , 'D' ])

print df

r = df. rolling( window= 3 , min_periods= 1 )

print r[[ 'A' , 'B' ]]. aggregate( np. sum)

Its output is as follows −

A B C D

2000-01-01 1.088512 -0.650942 -2.547450 -0.566858

2000-01-02 1.879182 -1.038796 -3.215581 -0.299575

2000-01-03 1.303660 -2.003821 -3.155154 -2.479355

2000-01-04 1.884801 -0.141119 -0.862400 -0.483331

2000-01-05 1.194699 0.010551 0.297378 -1.216695

2000-01-06 1.925393 1.968551 -0.968183 1.284044

2000-01-07 0.565208 0.032738 -2.125934 0.482797

2000-01-08 0.564129 -0.759118 -2.454374 -0.325454

2000-01-09 2.048458 -1.820537 -0.535232 -1.212381

2000-01-10 2.065750 0.383357 1.541496 -3.201469

A B

2000-01-01 1.088512 -0.650942

2000-01-02 1.879182 -1.038796

2000-01-03 1.303660 -2.003821

2000-01-04 1.884801 -0.141119

2000-01-05 1.194699 0.010551

2000-01-06 1.925393 1.968551

2000-01-07 0.565208 0.032738

2000-01-08 0.564129 -0.759118

2000-01-09 2.048458 -1.820537

2000-01-10 2.065750 0.383357

Apply Multiple Functions on a Single Column of a DataFrame:

import pandas as pd

import numpy as np

df = pd. DataFrame ( np. random. randn( 10 , 4 ),

index = pd. date_range( '1/1/2000' , periods= 10 ),

columns = [ 'A' , 'B' , 'C' , 'D' ])

print df

r = df. rolling( window= 3 , min_periods= 1 )

print r[ 'A' ]. aggregate([ np. sum, np. mean])

Its output is as follows −

A B C D

2000-01-01 1.088512 -0.650942 -2.547450 -0.566858

2000-01-02 1.879182 -1.038796 -3.215581 -0.299575

2000-01-03 1.303660 -2.003821 -3.155154 -2.479355

2000-01-04 1.884801 -0.141119 -0.862400 -0.483331

2000-01-05 1.194699 0.010551 0.297378 -1.216695

2000-01-06 1.925393 1.968551 -0.968183 1.284044

2000-01-07 0.565208 0.032738 -2.125934 0.482797

2000-01-08 0.564129 -0.759118 -2.454374 -0.325454

2000-01-09 2.048458 -1.820537 -0.535232 -1.212381

2000-01-10 2.065750 0.383357 1.541496 -3.201469

sum mean

2000-01-01 1.088512 1.088512

2000-01-02 1.879182 0.939591

2000-01-03 1.303660 0.434553

2000-01-04 1.884801 0.628267

2000-01-05 1.194699 0.398233

2000-01-06 1.925393 0.641798

2000-01-07 0.565208 0.188403

2000-01-08 0.564129 0.188043

2000-01-09 2.048458 0.682819

2000-01-10 2.065750 0.688583

Apply Multiple Functions on Multiple Columns of a DataFrame:

import pandas as pd

import numpy as np

df = pd. DataFrame ( np. random. randn( 10 , 4 ),

index = pd. date_range( '1/1/2000' , periods= 10 ),

columns = [ 'A' , 'B' , 'C' , 'D' ])

print df

r = df. rolling( window= 3 , min_periods= 1 )

print r[[ 'A' , 'B' ]]. aggregate([ np. sum, np. mean])

Its output is as follows −

A B C D

2000-01-01 1.088512 -0.650942 -2.547450 -0.566858

2000-01-02 1.879182 -1.038796 -3.215581 -0.299575

2000-01-03 1.303660 -2.003821 -3.155154 -2.479355

2000-01-04 1.884801 -0.141119 -0.862400 -0.483331

2000-01-05 1.194699 0.010551 0.297378 -1.216695

2000-01-06 1.925393 1.968551 -0.968183 1.284044

2000-01-07 0.565208 0.032738 -2.125934 0.482797

2000-01-08 0.564129 -0.759118 -2.454374 -0.325454

2000-01-09 2.048458 -1.820537 -0.535232 -1.212381

2000-01-10 2.065750 0.383357 1.541496 -3.201469

A B

sum mean sum mean

2000-01-01 1.088512 1.088512 -0.650942 -0.650942

2000-01-02 1.879182 0.939591 -1.038796 -0.519398

2000-01-03 1.303660 0.434553 -2.003821 -0.667940

2000-01-04 1.884801 0.628267 -0.141119 -0.047040

2000-01-05 1.194699 0.398233 0.010551 0.003517

2000-01-06 1.925393 0.641798 1.968551 0.656184

2000-01-07 0.565208 0.188403 0.032738 0.010913

2000-01-08 0.564129 0.188043 -0.759118 -0.253039

2000-01-09 2.048458 0.682819 -1.820537 -0.606846

2000-01-10 2.065750 0.688583 0.383357 0.127786