44  Sorting, previewing and more with dataframes

Tip

To allow all the exercises in this section to work, please run this code cell first!

This will import pandas and load the dataframe we’ll be working with.

44.1 Sorting values

We can sort values easily in Pandas. Let’s imagine we want to sort our original DataFrame by age, then by Patient ID :

The above line will change the original DataFrame because we’ve set inplace to True - so we don’t need to assign it back.

These two lines are equivalent:

df.sort_values(by=["Age"], inplace=True)

df = df.sort_values(by=["Age"])

Pandas will automatically sort by the index (PatientID here) where the values are equal for the column we are sorting by, so we don’t need to do that manually.

We can sort in the other direction by passing the argument ascending=False

We can also sort by multiple features at once.

If we want to sort these in different orders, we can pass in a list of booleans (True/False) to the ascend argument that are in the same order as our sorting columns.

This will

  • first sort by the flu vaccine column in ascending order
  • then, within each group for the flu vaccine column, it will sort by county in descending order
  • finally, within each group for the county column, it will sort people by age in descending order

44.2 Other neat pandas features

44.2.1 Describe

Pandas has a describe() function that allows us to get a quick overview of the numerical data in our DataFrame :

44.2.2 Previewing the dataset

44.2.2.2 tail()

We can see the last x entries using tail()

or we can specify a different number.

44.2.2.3 sample()

We can use the sample() method to see a random selection of rows.

or we can specify a different number.

44.2.3 Calculating statistics on a single column

It’s easy to take the mean of a column / DataFrame.

There are various other options too!

We could also do the sum on a dataset where that makes sense.