How to Get the Sum of a Grouped Dataframe in Pandas

If you are working with data in the Python programming language and using the Pandas library for data manipulation, you may come across a scenario where you need to group your data and find the sum of a specific column. This can be achieved using the groupby function in Pandas. In this article, we will explore how to use the groupby function to group your data and calculate the sum of a column.

Understanding the Problem

To better understand the problem, let's take a look at the example dataframe provided:


import pandas as pd

data = {
    'Fruit': ['Apples', 'Apples', 'Apples', 'Apples', 'Apples', 'Oranges', 'Oranges', 'Oranges', 'Oranges', 'Oranges', 'Grapes', 'Grapes', 'Grapes', 'Grapes', 'Grapes'],
    'Date': ['10/6/2016', '10/6/2016', '10/6/2016', '10/7/2016', '10/7/2016', '10/7/2016', '10/6/2016', '10/6/2016', '10/6/2016', '10/7/2016', '10/7/2016', '10/7/2016', '10/7/2016', '10/7/2016', '10/7/2016'],
    'Name': ['Bob', 'Bob', 'Mike', 'Steve', 'Bob', 'Bob', 'Tom', 'Mike', 'Bob', 'Tony', 'Bob', 'Tom', 'Bob', 'Bob', 'Tony'],
    'Number': [7, 8, 9, 10, 1, 2, 15, 57, 65, 1, 1, 87, 22, 12, 15]
}

df = pd.DataFrame(data)

The dataframe contains information about fruits, dates, names, and numbers. We want to calculate the total number of fruit for each name. For example, for the name "Bob" and fruit "Apples", the total number of fruit is 16.

Solution: Grouping and Aggregating Data

To solve this problem, we can use the groupby function in Pandas to group the data by the "Name" and "Fruit" columns, and then use the sum function to calculate the sum of the "Number" column.


grouped_data = df.groupby(['Name', 'Fruit'])['Number'].sum()
print(grouped_data)

The above code will group the data by the "Name" and "Fruit" columns and then calculate the sum of the "Number" column for each group. The result will be a Series object with multi-level indexing.

Example Output

When we run the above code, we will get the following output:


Name   Fruit 
Bob    Apples    16
       Grapes    46
       Oranges    2
Mike   Apples     9
       Oranges   57
Steve  Apples    10
Tom    Grapes    109
Tony   Grapes    15
       Oranges    1
Name: Number, dtype: int64

The output shows the total number of fruit for each name and fruit combination.

Explanation

The groupby function is applied to the dataframe and takes the columns 'Name' and 'Fruit' as the grouping variables. This creates a GroupBy object which can be used to perform various operations on the groups.

We then specify the column 'Number' to be aggregated using the sum function. This calculates the sum of the 'Number' column for each group.

Additional Operations

In addition to calculating the sum, the groupby function can be used to perform other aggregations, such as calculating the mean, maximum, minimum, etc. You can specify multiple aggregation functions by passing a list of functions to the agg method.


grouped_data = df.groupby(['Name', 'Fruit'])['Number'].agg(['sum', 'mean', 'max', 'min'])
print(grouped_data)

The above code will calculate the sum, mean, maximum, and minimum values of the 'Number' column for each group of names and fruits.

Conclusion

In this article, we have learned how to use the Pandas groupby function to group your data and calculate the sum of a column. We have also explored additional operations that can be performed using the groupby function, such as calculating the mean, maximum, and minimum values. By using the groupby function, you can easily perform group-wise operations on your data and gain valuable insights.