44  Exercise: Pandas

Now let’s have a go at using some pandas code.

There are 8 shorter tasks to complete, and you will be using a dataset that is commonly used in Machine Learning (and which we’ll be using later in the HSMA course) - the Titanic Dataset from Kaggle. This dataset contains data about passengers on board the Titanic.

Open Exercise in Google Colab: Open In Colab

When using Colab, you can point towards the titanic dataset csv on the HSMA repository. The following will work:

pd.read_csv("https://raw.githubusercontent.com/hsma-programme/h6_1f_python_part_2/main/1f_python_programming_part_2/titanic_dataset.csv",
            #INSERT THE REST OF YOUR IMPORT CODE HERE)

44.1 Sample Answers

Open exercise solutions in Google Colab: Open In Colab

You will need to change the second line in the first code cell (where we import the dataset) to the following to make this work in Google Colab:

titanic_df = pd.read_csv("https://raw.githubusercontent.com/hsma-programme/h6_1f_python_part_2/main/1f_python_programming_part_2/titanic_dataset.csv", index_col="PassengerId")

44.2 Answer Video

44.3 Importing csvs into pandas

NOTE! Part of the exercise - setting the index column was forgotten in this video! In the second video, the answer is corrected to include the step of setting the PassengerId column to being the index.

44.4 Working with data using pandas