=["Age"], inplace=True)
df.sort_values(by
= df.sort_values(by=["Age"]) df
44 Sorting, previewing and more with dataframes
To allow all the exercises in this section to work, please run this code cell first!
This will import pandas and load the dataframe we’ll be working with.
44.1 Sorting values
We can sort values easily in Pandas. Let’s imagine we want to sort our original DataFrame by age, then by Patient ID :
The above line will change the original DataFrame because we’ve set inplace to True - so we don’t need to assign it back.
These two lines are equivalent:
Pandas will automatically sort by the index (PatientID here) where the values are equal for the column we are sorting by, so we don’t need to do that manually.
We can sort in the other direction by passing the argument ascending=False
We can also sort by multiple features at once.
If we want to sort these in different orders, we can pass in a list of booleans (True
/False
) to the ascend argument that are in the same order as our sorting columns.
This will
- first sort by the flu vaccine column in ascending order
- then, within each group for the flu vaccine column, it will sort by county in descending order
- finally, within each group for the county column, it will sort people by age in descending order
44.2 Other neat pandas features
44.2.1 Describe
Pandas has a describe() function that allows us to get a quick overview of the numerical data in our DataFrame :
44.2.2 Previewing the dataset
44.2.2.1 head()
We can see the first x number of entries in our DataFrame using head()
The default is 5 entries
or we can specify a different number.
44.2.2.2 tail()
We can see the last x entries using tail()
or we can specify a different number.
44.2.2.3 sample()
We can use the sample()
method to see a random selection of rows.
or we can specify a different number.
44.2.3 Calculating statistics on a single column
It’s easy to take the mean of a column / DataFrame.
There are various other options too!
We could also do the sum on a dataset where that makes sense.