45 Exercise: Pandas
Now let’s have a go at using some pandas code.
There are 8 shorter tasks to complete, and you will be using a dataset that is commonly used in Machine Learning (and which we’ll be using later in the HSMA course) - the Titanic Dataset from Kaggle. This dataset contains data about passengers on board the Titanic.
Open Exercise in Google Colab:
When using Colab, you can point towards the titanic dataset csv on the HSMA repository. The following will work:
pd.read_csv("https://raw.githubusercontent.com/hsma-programme/h6_1f_python_part_2/main/1f_python_programming_part_2/titanic_dataset.csv",
#INSERT THE REST OF YOUR IMPORT CODE HERE)
45.1 Sample Answers
Open exercise solutions in Google Colab:
You will need to change the second line in the first code cell (where we import the dataset) to the following to make this work in Google Colab:
titanic_df = pd.read_csv("https://raw.githubusercontent.com/hsma-programme/h6_1f_python_part_2/main/1f_python_programming_part_2/titanic_dataset.csv", index_col="PassengerId")
45.2 Answer Video
45.3 Importing csvs into pandas
NOTE! Part of the exercise - setting the index column was forgotten in this video! In the second video, the answer is corrected to include the step of setting the PassengerId column to being the index.