43  Aggregating Data in Dataframes

Tip

To allow all the exercises in this section to work, please run this code cell first!

This will import pandas and load the dataframe we’ll be working with.

43.1 Value Counts

We can quickly count the number of values for categorical data in a Pandas DataFrame. Let’s say we want to look at the number of those who have vs haven’t been vaccinated in our data.

We can combine the above with an indexing operation to, for example, give us the value counts of flu vaccines but only for the county of Cornwall :

43.2 Pivot Tables

We can create Pivot Tables quickly and easily in Python (and far more computationally efficiently than Excel, particularly for very large datasets). The resultant Pivot Tables we create are also themselves Pandas DataFrames, and so we can work with them in the same way as any other DataFrame.

Let’s imagine we want to look at the average age by county in our DataFrame :

To create the table above, we specified

  • the name of the DataFrame on which we want to pivot
  • the column we will use as the index
  • the column(s) containing our data of interest
  • and the function we want to perform on the data of interest.

The above basically says “Give me a table that, by county, gives me mean age”

As the resultant Pivot Table is a DataFrame itself, we can work with it like any other DataFrame. For example, to retrieve the mean for Cornwall (remember, County is the index in this new DataFrame) :

To add more than one value of comparison to your pivot table, you need to pass the values in as a list. For example :

You can learn more about pivot tables here.