HSMA 6 April 2025 Project Forum - Tips, Tricks and Updates

Sammi Rosser

Health Service Modelling Associates Programme

Reminder - Bookable Slots

Don’t forget that you can book time to see me.

Slots are available at

1:00, 1:30 and 2:00 every Tuesday
11:30, 12:00 and 12:30 every Wedneday

Slots are currently available up until the second week of June.

Book via this link

Pop me a message if you can’t make those times and we can work something out.

Useful Tools - Notebook LM

What is NotebookLM?

Notebook LM is a Google product that allows you to pass in specfic sources that will be used by a generative AI model.

You can feed it a range of sources, including

Google slides
Youtube videos
webpages

(Thanks to Joel for originally making me aware of this tool)

Why might NotebookLM be useful to you?

Because it can be trained on specific sources, you can use it to remind yourself of the HSMA way of doing things.

A note about code

Notebook LM has the really nice feature of actually referring back to the source it’s used

But in code, that means you get the sources appearing in [] in a way that would break the code…

Other Features

The ‘deep dive’ feature is a way to get a good summary of complex information like scientific papers.

It generates a podcast-like conversation!

It can also generate other formats like an FAQ document.

Things to note

Don’t put anything private/sensitive in there!
- They say it’s not reviewed by humans or used to train models… but still…
You can’t share your notebook if you don’t have a paid Google AI plan (which is a shame if you’re working in a team)
- it will look like sharing is available, but it doesn’t work…
Websites (like the DES book) need to be added page-by-page
- quite tedious!
- this also means you might quickly hit the free plan limit of 50 sources per notebook

Automated Documentation Generation with Generative AI

The Tool

This repository contains a script to automatically generate documentation from a codebase.

https://github.com/The-Pocket/PocketFlow-Tutorial-Codebase-Knowledge

It’s designed to write beginner-friendly documentation with loads of analogies and clear breakdowns of what is going on.

Why is this useful?

Documentation is the thing no-one wants to do…

And it always goes out of date!

In addition, if you’re trying to work with someone else’s code, it can be really hard to get your head around how everything is done.

Example page

How does it do this?

The framework

reads all of the files in the GitHub repository (or you can tell it to focus on only certain files or file types)
passes this to your chosen LLM, which decides how best to split them up into chapters
passes the chapter details to the LLM, which then writes several chapters in markdown format

(I’ve then been converting the output into Quarto for easy integration with existing documentation)

Code Examples

It generates code examples…

Diagrams

And diagrams…

Prompt Editing

Because of the way this code is designed, you can modify the prompt that gets fed in.

It just lives in text in the nodes.py file.

Custom Documentation

So you can ask it to make the documentation more or less beginner friendly, or add very specific requests.

Things to note

Anything you feed into this may be fair game for model training.
- And you certainly don’t want to feed in anything that needs to remain private!
- But if your repository is already open-source, then you may wish to consider it.
It will probably do better with a repository where you have included comments and docstrings

Cost

Google’s most advanced model (gemini-2.5-pro-exp-03-25) is currently free for roughly 25 requests a day.

For me, that was enough to generate two sets of documentation per day before hitting the limit.

If there isn’t a billing account linked to your Google account, you shouldn’t be able to exceed - but check and set up spending limits first!

Further Info

I’ll put together a quick guide on the ‘how’ in the next few days, but if you’re keen to try it out sooner, have a go at following the instructions in the repository readme (and message me if you get stuck).

And have a read of a sample generated here: https://sammirosser.com/vidigi_autodoc_test/

DES Projects - Getting distributions from real data

There’s a new chapter in the DES book on getting distributions from real data!

Recap - distributions

(And many more - scipy provides functions for over 80…)

Recap - distribution (contd.)

Source: By Skbkekas - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=9447142

Source: By Inductiveload - Own work (Original text: self-made, Mathematica, Inkscape), Public Domain, https://commons.wikimedia.org/w/index.php?curid=3817954

How to work out what best represents our data?

The fitter package

We’ll use the fitter package.

Let’s assume we start with a dataframe of historical activity times.

Fitting

You pass in a list of values…

from fitter import Fitter

f = Fitter(df["duration"],
            distributions=distributions_to_scan, # See the chapter for more details
            timeout=60
            )

f.fit()

The summary feature

The get_best method

We can then apply the ‘get_best’ method to our object to return

the best-fitting distribution
the parameters for that

nurse_appt_duration_fit = f.get_best()

nurse_appt_duration_fit

And this can be used to then pass into your model.

Using this to initialise a distribution

Remembering that our output looked like this:

from scipy.stats import gamma

def get_nurse_appt_duration():
    return gamma.rvs( # stands for random varables
        a=nurse_appt_duration_fit['gamma']["a"], # accessing parts of the dictionary
        loc=nurse_appt_duration_fit['gamma']["loc"],
        scale=nurse_appt_duration_fit['gamma']["scale"]
        )

Running get_nurse_appt_duration() then returns one suitable time from that distribution.

DES - Event Logging

There is also a new chapter on event logging.

We will be using the term ‘event logging’ to describe the process of generating a step-by-step log of what happens to each entity as they pass through our system.

The resulting file will be an ‘event log’.

Why Event Log?

Many models have very similar basic steps and structures when you boil it down - so you can start to develop fairly reusable code snippets
Saving the event log as a csv or similar means you can develop visuals + metrics without rerunning the model each time
Easy to add to

My Recommended Event Log Structure

…

Key Parts of the Event Log

entity_id: a unique identifider to allow us to follow a given entity through their journey

event_type: this column is used to distinguish between three key kinds of events:

arrival_departure: an entity first entering the system, or the entity leaving the system
queue: an entity beginning to queue for a resource
- this can also be used to just generally record the movement of someone through points of a system that don’t necessarily have a traditional ‘queue’
resource_use: this relates to anything where someone starts or ends their time with a resource in the system

Key Parts of the Event Log (contd.)

event: this column further breaks down what is happening during each event type, such as what stage of the system people are waiting to interact with

time: this can be an absolute timestamp in the form of a datetime (e.g. 2027-01-01 23:01:47), or a relative timestamp in time units from the start of the simulation.

run: in a multi-run Trial, which run the result came from

Building the event log

We first add an empty list to our model to store the logs…

class Model():

  def __init__(self, run_number):
      # Create a SimPy environment in which everything will live
      self.env = simpy.Environment()

      # Add an empty list to store our event logs in
      self.event_log = []

      # rest of model code...

Adding entries

Then, each time we want to record something about the patent, we add in something like this:

self.event_log.append(
        {'patient': entity_identifier,
            'pathway': 'My_Pathway_Name',
            'event_type': 'arrival_departure', # or 'queue', 'resource_use', or 'resource_use_end'
            'event': 'arrival', # or 'depart', or for 'queue' and 'resource_use' or 'resource_use_end' you can determine your own event name
            'time': self.env.now}
    )

(You could wrap that in a helper function if you preferred!)

Turning this into a full log

In our ‘run’ method and the ‘Trial’ class, you make a few more changes to turn these dictionaries into a final output and save them.

We can then output this - either as a value returned when the model is run, or as a csv (or both).

Resource IDs

An optional but extremely useful part of your event log is a resource_id
This allows you to track which resource was being used by a particular entity
So if you have five nurses, you know which one they saw - allowing you to look in more depth at resource use, and being valuable for resource utilisation.
This is a bit more complex due to not being natively supported by simpy.
- But it’s covered in the book (with some custom helper functions provided): https://des.hsma.co.uk/event_logging.html#logging-resource-ids

What Next?

Once you have this log, you can use it for a wide range of visuals

Process logs (which I’ll talk about in a moment)
Animated visuals with the vidigi package
Various box plots, scatterplots and bar charts to aid with interpretation and validation of your model

Example Visuals

Example Visuals - Real-world Comparisons

Considerations

Note

Not every patient in your simulation will have all steps
- how will you handle patients in your log who didn’t finish their journey?
- how might this misrepresent the situation in some circumstances?

More from me

Library of code snippets coming soon!

(fancy contributing?)

An Intro to Process Analytics with BupaR - and why it can help your DES project!

bupaR is an R library for process mining - but we won’t let that stop us!

With our event logs + some handy code snippets, generating some process mapping outputs becomes possible!

A Sample bupaR visual

R and Python??

There are a few different ways we could get our Python event logs to work with bupaR:

the reticulate package (which runs Python from R) - though due to the complexity of our code, this is likely to run into issues
the r2py package (which runs R from Python) - as we only want a little bit of R in a primarily Python project, this might be a better option
Quarto’s features for passings objects like dataframes between R and Python cells
exporting our event log as a csv, importing this into R, and saving the resulting bupaR visuals
- the visuals we export can then be imported back into Streamlit apps or Quarto reports

bupaRs Outputs - counts of activities

library(bupaverse)

activity_log %>%
    process_map(frequency("absolute"))

bupaRs Outputs - percentages of activities

activity_log %>%
    process_map(frequency("relative"))

bupaRs Outputs - performance maps

activity_log %>%
    process_map(performance())

You can choose between mean stage time, max, min, etc.

activity_log %>%
    process_map(performance(FUN = max))

bupaRs Outputs - idle time

activity_log %>%
    idle_time("resource", units = "mins") %>%
    plot()

bupaRs Outputs - Animations

library(processanimateR)

activity_log %>%
    animate_process()

bupaR animations vs vidigi animations

Find Out More

Find some code to help you do this with your own simulation logs: https://des.hsma.co.uk/process_logs_with_bupar.html

In that section you will find

Explore other visuals in their documentation: https://bupaverse.github.io/docs/visualize.html

Verification and Validation of DES Models

Book chapter coming soon…

Verification = did we build it right (to match our conceptual model, and without bugs?)

Validation = does it actually match the real world well enough to be useful?

A quick overview of verification and validation

Verification

Code review completed
Unit tests for key functions
Manual event trace performed (following sample patients through)
Consistent time units confirmed
Queue and resource handling checked
Boundary/extreme cases tested
Randomness + reproducibility checked

Validation

Input data compared to real-world
Output results compared to real-world data
Face validity confirmed with experts
Sensitivity tests performed
Behavior under extreme scenarios checked
Long-term system stability checked

Assumptions and simplifications documented and justified

Statistical testing

There are a range of tests you could consider running to compare your simulation outputs with your historical data.

Test	Type of Data	Compares	Purpose	Use Case in Simulation Validation
t-test	Continuous	Means between groups	Assess if the average values differ significantly	Compare average response times, service durations, etc.
Chi-squared test	Categorical	Frequency distributions	Assess if category frequencies match expected distribution	Compare call types, dispatch priority levels, station allocations
Kolmogorov–Smirnov (KS) 2-sample test	Continuous	Entire distributions	Assess if two samples come from the same distribution	Compare distributions of interarrival times, on-scene durations

Example Code

# Pull out daily number of calls across simulation and reality
sim_calls = np.array(average_monthly_calls['daily_calls'])  # simulated data

# basically a list like [4, 6, 12, 5, 7, 4, 4, 19, ...]

real_calls = np.array(historical_monthly_calls['daily_calls'])  # real data

# Welch’s t-test (does not assume equal variances)
t_stat, p_value = stats.ttest_ind(sim_calls, real_calls, equal_var=False)

# Thresholds
p_thresh = 0.05

Example code (contd.)

Statistical significance ≠ practical significance
- may want to consider an additional metric to determine whether difference likely matters in reality

# Mean difference and effect size
mean_diff = np.mean(sim_calls) - np.mean(real_calls)
pooled_std = np.sqrt((np.std(sim_calls, ddof=1) ** 2 + np.std(real_calls, ddof=1) ** 2) / 2)
cohen_d = mean_diff / pooled_std

# Thresholds
effect_size_fail_thresh = 0.5

# Will only fail if significance threshold is met and cohen's D is sufficiently large
if p_value < p_thresh and abs(cohen_d) > effect_size_fail_thresh:
    pytest.fail(f"""[FAIL - COMPARISON WITH REALITY]
    **Mean Daily Calls** significantly different between simulation and reality.
    p={p_value:.4f}.
    Cohen's d={cohen_d:.2f}.
    Sim mean: {np.mean(sim_calls):.2f}
    Real mean: {np.mean(real_calls):.2f}.
    Mean diff: {mean_diff:.2f}.""")

New Streamlit Features

Streamlit 1.44 adds more theming options

Version 1.44 Release Notes

What can you change now?

A lot more than before!

Custom Fonts

Custom fonts appear to be supported via downloaded font files

Work In Progress - an alternative approach to boundary optimization

Why is Boundary Optimization more complex?

Compared to sites, where you can have any combination… (though they won’t all be sensible)

Boundaries have to make sense…

So how would we build up a new candidate combination?

Let’s play a game

Let’s add more players

How do we do this?

LSOAs (or any other geography) have a concept of neighbours (something sharing a border)

Here, we start with a territory that’s pretty central in the existing territories

we could choose a random cell from the existing territory
or completely at random
or we could have a different number of players to our existing territory

The game

On each step, it

rolls a dice to decide if it will take part in this round
chooses a neighbour at random from the list of neighbours
- if no-one already owns the neighbour, it takes ownership
- if someone does already own it, it takes no territory on that turn

This randomness ensures we don’t end up with every player owning the same number of units of territory each time.

Building a single solution

Assessing the solutions

We can then generate multiple allocations, which will all (probably) differ.
We can then score each solution on a metric
- here, we’re trying to minimize the difference in received calls across each region, so we work out the average across all regions and then work out the absolute difference.
  - The closer the resulting value is to zero, the better the solution is.
- but you might use a different metric!

Assessing the solutions (contd.)

Making the solutions better

Rather than just randomly generating thousands of candidate solutions, we may move to a better solution faster by varying the best random solutions.

With some more boundary logic, we can create new possible solutions that are a variation on our best solution from the previous step.

Evaluating our evolution

We then evaluate again.

And on and on!

And - in theory - we’d keep going for a set number of generations or a defined amount of compute time - and see how good a solution we can come up with!

Why do I think this is worth doing?

It’s a relatively intuitive method compared to some of the other theoretical approaches (I think…?)
The resulting boundaries can be made to follow some sort of recognised/logical existing boundaries, which can be practical in reality

Current Issues

Slow…
Doesn’t have an end-to-end process (yet)
Some of the generated territories are impractical in reality (e.g. long and thin protrusions)
Shouldn’t this already be a solved problem…?

Planned next steps

Make it more efficient
Implement multi-objective optimisation
- in reality, you might want to optimize on a balance of multiple things, like
  - average travel time
  - max travel time
  - average deprivation scores(?)
  - how ‘compact’ the resulting territory is
Make it into a library?
Find people who might have a problem we can test it out on!

HSMA 6 April 2025 Project Forum - Tips, Tricks and Updates

Reminder - Bookable Slots

Useful Tools - Notebook LM

What is NotebookLM?

Why might NotebookLM be useful to you?

A note about code

Other Features

Things to note

Automated Documentation Generation with Generative AI

The Tool

Why is this useful?

Example page

How does it do this?

Code Examples

Diagrams

Prompt Editing

Custom Documentation

Things to note

Cost

Further Info

DES Projects - Getting distributions from real data

Recap - distributions

Recap - distribution (contd.)

The fitter package

Fitting

The summary feature

The get_best method

Using this to initialise a distribution

DES - Event Logging

Why Event Log?

My Recommended Event Log Structure

Key Parts of the Event Log

Key Parts of the Event Log (contd.)

Building the event log

Adding entries

Turning this into a full log

Resource IDs

What Next?

Example Visuals

Example Visuals - Real-world Comparisons

Considerations

More from me

An Intro to Process Analytics with BupaR - and why it can help your DES project!

A Sample bupaR visual

R and Python??

bupaRs Outputs - counts of activities

bupaRs Outputs - percentages of activities

bupaRs Outputs - performance maps

bupaRs Outputs - idle time

bupaRs Outputs - Animations

bupaR animations vs vidigi animations

Find Out More

Verification and Validation of DES Models

A quick overview of verification and validation

Verification

Validation

Statistical testing

Example Code

Example code (contd.)

New Streamlit Features

Streamlit 1.44 adds more theming options

What can you change now?

Custom Fonts

Sidebar Theme

Work In Progress - an alternative approach to boundary optimization

Why is Boundary Optimization more complex?

Boundaries have to make sense…

Let’s play a game

Let’s add more players

How do we do this?

The game

Building a single solution

Assessing the solutions

Assessing the solutions (contd.)

Making the solutions better

Evaluating our evolution

And on and on!

Why do I think this is worth doing?

Current Issues

Planned next steps