The slides, exercises and content from the HSMA session is freely available for everyone to access!
Click into the module you are interested in to find a full list of the sessions within that module, including links to the session recording, the slides, and the code-based exercises and solutions.
Modules are listed in the order they are delivered on the course, but while module 1 is essential for all other modules, most of the remaining modules can be consumed independently.
The only exceptions are
- deploying reproducible reports online in module 8 (an optional exercise) relies on github content from session 7A
- natural language processing in module 5, which benefits from completing module 4 (machine learning) first
Module 1: Introduction to Operational Research, Data Science and Programming
This module serves as an introduction to the fields of operational research and data science, as well as ensuring you have the tools setup and all of the Python knowledge you need to be able to access the later sessions delivered on the course.
It covers
• How the HSMA course is structured
• Why we should use Free and Open Source (FOSS) software
• What Operational Research and Data Science are
• The conceptual modelling process
• The concepts underpinning coding, such as conditional logic, the object oriented paradigm, and more
• How to set up the VSCode Integrated Development Environment (IDE) for Python Programming
• How to set up environments for package management throughout the HSMA course using Anaconda or the built-in environment features of VSCode
• How to write code using the Python programming language
Module 2: Modelling Pathway and Queuing Problems
This module serves as an introduction to modelling pathway problems with discrete event simulation (DES).
It covers
• What DES is and where it may be useful
• The key terminology associated with DES (e.g. resources, entities, sinks)
• How to simplify a real-world pathway modelling problem into a conceptual model
• Python generator functions
• The features of the SimPy package
• How to write simple simulations in Python with SimPy
• How to deal with multi-step pathways
• How to deal with branching pathways
• How to build in more complex features to your pathway models, including warm-up periods, priority-based queuing, resource unavailability, Lognormal distributions, reneging, balking, and jockeying
Module 3: Geographic Modelling and Visualisation
This module serves as an introduction to creating maps and solving geographical problems in Python.
The module begins with an overview of key geographic concepts, including projections, coordinate reference systems, and levels of geographic data in use within the UK.
It then moves on to the usage of the free program QGIS, allowing maps to be made in a powerful point-and-click tool without making use of code. In QGIS, we cover
• Loading in geographic data stored in a range of formats, including geoJSON, geopackages, shapefiles, and csv files
• Displaying point data
• Displaying area data (creation of choropleths)]
• Enhancing QGIS maps with labels, custom colour groups, icons, and more
• Creating custom print layouts to display one or multiple QGIS maps along with titles, text and legends
The next part of the module covers working with geographic data in Python.
• Loading in geographic data (e.g. geoJSON, geopackages, shapefiles) with the geopandas package
• Converting pandas dataframes into geopandas dataframes
• Visualising point data and area data in static maps using matplotlib
• Customising and polishing static matplotlib maps
• Visualising point data and area data in static maps using folium
• Working with and visualising travel time matrices
• Obtaining travel time data for multiple modes of transport from free APIs using the routingpy package
• Obtaining and visualising isochrone data
The module ends with an exploration of how to solve geographic optimisation problems - placing a number of sites in a way that minimises and objective like average travel time or distance - in Python. The module gives code examples for brute force solutions, but touches on more the more advanced evolutionary and genetic algorithm approaches and the value of multiobjective optimisation.
Module 4: Machine Learning
This module serves as an introduction to machine learning methods in Python. Students make use of packages including scikit-learn and tensorflow to train and assess machine learning models that can classify data into groups or predict a numeric value (e.g. length of stay).
It covers
• Key machine learning concepts, such as train-test splits, accuracy, the bias-variance trade-off, and more
• Preparing data for machine learning with one-hot encoding, feature engineering
• Machine learning ethics
• Tackling classification problems with machine learning
• Tackling regression problems (numeric predictions) with machine learning
• The theory of, and how to apply, a wide range of algorithms using the scikit-learn package: logistic regression, decision trees, random forests, XGBoost, LightGBM, and more
• Creating ensemble models
• Assessing the performance of machine learning model in classification and regression problems with accuracy, precision, recall, f1 score, confusion matrices, the receiver operating curve, the ROC AUC, and more
• How to explain the activity of black-box models with techniques such as individual conditional expectation (ICE) plots, partial dependence plots (PDPs), SHAP values, and more
• Calibrating models for obtaining prediction probabilities
• Optimising model parameters efficiently with grid search and the optuna framework
• Optimising the selected features with a range of feature selection approaches
• Creating neural networks for classification problems with the tensorflow package
• The principles of reinforcement learning
• Creating synthetic data with SMOTE
Module 5: Natural Language Processing
In this module we explore a range of techniques that can help to make sense of the reams of unstructured text-based data that exists in healthcare systems around the world.
Starting off with an overview of the terminology and key concepts of the field of natural language processing (NLP), we then work on processing some real text data and exploring the frequency of words and phrases within it, using this to create wordcloud visualisations of these texts.
We then move on to Named Entity Recognition (NER), exploring a library we can use to automatically recognise anything from people to places.
Finally, we move on to sentiment analysis - a technique more closely related to the machine learning work we did in module 4. We explore how to train a neural network that can identify whether a piece of text is positve or negative in tone, providing an insight into a flexible class of machine learning that can be used for other text classification tasks.
This year, the module ended with a 6 hour hackathon in which students were able to work in their peer support groups on a natural language processing task of their choice. While there is no taught content for this session, the presentations given by the groups of their projects from the day are available to watch.
Module 6: Modelling Behaviour and System Dynamics
In module 6, we explore two more techniques for modelling pathways and systems.
We kick off with system dynamics - a powerful tool for qualitative or quantitative (numeric) simulations of systems at a high level. System Dynamics excels at unpicking fundamental issues with the structure of systems. This session is codeless, using the online InsightMaker platform instead.
We then move onto agent-based simulation (ABS) - a simulation technique that is well-suited to instances where we are interested in the emergent behaviour of systems that comes from the interactions of and decisions of individuals. Commonly used in disease spread modelling, we explore how to write our own simulations using the MESA package, as well as digging into its possible applications in a more general healthcare context.
Module 7: Open Source Collaborative Development and Web Development
In this module we start by taking a look at a crucial skill for the analyst, data science or operational researcher's toolbox - version control with Git. Git ensures that the full history of your code is available, allowing you to
• roll back to earlier versions of code
• keep a clean working copy of code safe and accessible while being able to start working on new features
Say goodbye to V2, V3_FINAL, and V_ACTUAL_FINAL_VERSION - with Git, you will always know where you are.
We then explore the online platform GitHub, which allows for hosting of code folders that are controlled using Git. GitHub is possibly the most used code-sharing and collaboration platform in the world, and is a crucial part of ensuring your code is available for others to audit, use and build on.
We then move on to creating web interfaces. With the advent of a range of tools that are designed to make it easier for people to create data-driven web apps and interactive model interfaces without having to learn the intricacies of traditional web development, it's a great time to start developing your own interfaces to get your scripts into the hands of others - without them having to install or use the code directly.
In this module, we'll be working with the Streamlit framework - a beginner-friendly but powerful Python library that will get your writing your first interactive apps in minutes.
We will cover
• creating simple apps
• adding interactive elements like sliders, text and numeric inputs, dropdowns, and more
• adding outputs such as dataframes, interactive plots, and maps
• theming your app
• using layout elements such as tabs, columns, expanders, sidebars, and multi-page navigation
• improving the running of your app with advanced features like caching, fragments and sessin state
We will round off the module by practicing our Git and GitHub skills, uploading our apps to a GitHub repository so we can then host them on the Streamlit Community Cloud platform.
Module 8: Modern Analytics
This module serves to introduce some additional techniques that are useful parts of the modern analyst's skillset.
We start with a session on reproducible reporting, exploring how we can create beautiful html, pdf and Word reports with the Quarto library. Quarto allows for code - and the output of code - to be interweaved with text, headings, images, videos, and much more. We explore standard document layouts as well as dashboard-style layouts, and cover a range of useful Quarto parameters and tips.
With Quarto, we also look at automation and parameterisation of reports, exploring how to generate multiple reports just by running a simple Python script.
In the final session of HSMA 6, we take a look at time-series forecasting methods. These methods take historical patterns and use them to predict the future. The session covers a range of time-series forecasting approaches, as well as the circumstances under which you may consider using them and how to assess their performance.