13 Reproducibility

One great thing about DES is controlled randomness.

Now, computers aren’t very good at being completely random!

What we do is give it a ‘seed’ to start from when we are sampling from a distribution..

If our seed is 1, maybe the arrival times sampled from the distribution will be

5 minutes, 2 minutes, 3 minutes, 6 minutes, 5 minutes

If our seed is 101, maybe the arrival times will be

10 minutes, 2 minutes, 5 minutes, 5 minutes, 3 minutes

Why is this good?

We can either - Change the random number and see how the same system can cope with different patterns of arrivals - Keep the random number the same and see how changing the system affects performance

Let’s go back to our branching model.

At the moment, we have not set a seed anywhere explicitly.

What the random package will do is default the seed to being the date and time that the code is run at - this helps to ensure the results are random, but this can then cause us some problems.

When we do 100 runs, change the parameters (say add an extra nurse) and then do another 100 runs, we don’t currently know how much of the difference in results is due to the change in system (the extra nurse), and how much is due to random variation in the number and timing of arrivals, how long their consultations take, and the chance of them going on to the second consultation.

So let’s set a random seed so that we fix the patterns and arrivals.

13.1 Exploring ways of coding in reproducibility

Warning

The method shown in this section has limitations - but reading through this section will help you understand more about seeds and why the method in Section 13.2 (“A robust way to ensure controlled randomness”) is better.

The best place to do this is in our trial class.

In our run_trial method within that class, we can set the seed so that it matches the run number.

This will ensure each run has a different seed, but that the seed is the same across different runs.

def run_trial(self):
    # Run the simulation for the number of runs specified in g class.
    # For each run, we create a new instance of the Model class and call its
    # run method, which sets everything else in motion.  Once the run has
    # completed, we grab out the stored run results (just mean queuing time
    # here) and store it against the run number in the trial results
    # dataframe.
    for run in range(g.number_of_runs):
        random.seed(run)

        my_model = Model(run)
        my_model.run()

        self.df_trial_results.loc[run] = [my_model.mean_q_time_recep,
                                          my_model.mean_q_time_nurse,
                                          my_model.mean_q_time_doctor]

    # Once the trial (ie all runs) has completed, print the final results
    self.print_trial_results()

Let’s look at the output now.

Let’s run 100 trials and look at the outputs.

# Create an instance of the Trial class
my_trial = Trial()

# Call the run_trial method of our Trial object
my_trial.run_trial()

Trial Results
            Arrivals  Mean Q Time Recep  Mean Q Time Nurse  Mean Q Time Doctor
Run Number                                                                    
0              105.0               0.48               6.77               51.94
1              117.0               0.90              89.23               10.82
2              110.0               1.86              27.21               33.57
3              115.0               0.92              78.11               55.77
4              104.0               1.52              46.56               94.60
...              ...                ...                ...                 ...
95             127.0               0.62              96.29               69.36
96             132.0               1.01              54.03               34.93
97             126.0               2.20              67.00               76.15
98             152.0               1.67              89.61               10.58
99             112.0               0.70              22.13               82.26

[100 rows x 4 columns]
Arrivals              120.44
Mean Q Time Recep       1.31
Mean Q Time Nurse      64.46
Mean Q Time Doctor     35.10
dtype: float64

Now let’s run 100 trials again. Are the results the same?

Let’s run 100 trials and look at the outputs.

# Create an instance of the Trial class
my_trial = Trial()

# Call the run_trial method of our Trial object
my_trial.run_trial()

Trial Results
            Arrivals  Mean Q Time Recep  Mean Q Time Nurse  Mean Q Time Doctor
Run Number                                                                    
0              105.0               0.48               6.77               51.94
1              117.0               0.90              89.23               10.82
2              110.0               1.86              27.21               33.57
3              115.0               0.92              78.11               55.77
4              104.0               1.52              46.56               94.60
...              ...                ...                ...                 ...
95             127.0               0.62              96.29               69.36
96             132.0               1.01              54.03               34.93
97             126.0               2.20              67.00               76.15
98             152.0               1.67              89.61               10.58
99             112.0               0.70              22.13               82.26

[100 rows x 4 columns]
Arrivals              120.44
Mean Q Time Recep       1.31
Mean Q Time Nurse      64.46
Mean Q Time Doctor     35.10
dtype: float64

Yes!

Now let’s compare this when we start changing the number of nurses.

This is going to change the queue times for nurses and, by extension, for doctors (as people will be turning up to the doctors at different times).

However, the number of arrivals should remain unchanged.

Trial Results
            Arrivals  Mean Q Time Recep  Mean Q Time Nurse  Mean Q Time Doctor
Run Number                                                                    
0              105.0               0.48               6.77               51.94
1              117.0               0.90              89.23               10.82
2              110.0               1.86              27.21               33.57
3              115.0               0.92              78.11               55.77
4              104.0               1.52              46.56               94.60
...              ...                ...                ...                 ...
95             127.0               0.62              96.29               69.36
96             132.0               1.01              54.03               34.93
97             126.0               2.20              67.00               76.15
98             152.0               1.67              89.61               10.58
99             112.0               0.70              22.13               82.26

[100 rows x 4 columns]
Arrivals              120.44
Mean Q Time Recep       1.31
Mean Q Time Nurse      64.46
Mean Q Time Doctor     35.10
dtype: float64

Trial Results
            Arrivals  Mean Q Time Recep  Mean Q Time Nurse  Mean Q Time Doctor
Run Number                                                                    
0              106.0               0.96               0.78               52.59
1               99.0               0.43               2.29               22.72
2              116.0               2.96               2.84               93.46
3              123.0               0.86               1.27               57.58
4              114.0               2.40               3.53               68.42
...              ...                ...                ...                 ...
95             123.0               1.39               3.73               45.02
96             138.0               0.97               1.95              102.04
97             116.0               1.33               2.34               24.49
98             125.0               1.07               1.59               40.84
99             129.0               1.47               5.90               17.22

[100 rows x 4 columns]
Arrivals              118.39
Mean Q Time Recep       1.23
Mean Q Time Nurse       2.98
Mean Q Time Doctor     53.23
dtype: float64

Unfortunately, what we wanted (and needed) to happen, hasn’t.

Instead, we are seeing that the number of arrivals are changing too.

Note

This is because of the way random number generation occurs.

The order the random numbers are generated in matters - and as the order of events changes (in this case, as we have more nurses, they can see more patients quicker, changing the order that subsequent events happen in).

Let’s investigate this with two examples.

random.seed(42)

print(f"1: inter-arrival time 1 {random.expovariate(1.0 / g.patient_inter):.2f}")
print(f"3: inter-arrival time 2 {random.expovariate(1.0 / g.patient_inter):.2f}")
print(f"2: reception consult time 1 {random.expovariate(1.0 / g.mean_reception_time):.2f}")
print(f"4: inter-arrival time 3 {random.expovariate(1.0 / g.patient_inter):.2f}")

1: inter-arrival time 1 5.10
3: inter-arrival time 2 0.13
2: reception consult time 1 0.64
4: inter-arrival time 3 1.26

random.seed(42)

print(f"1: inter-arrival time 1 {random.expovariate(1.0 / g.patient_inter):.2f}")
print(f"2: inter-arrival time 2 {random.expovariate(1.0 / g.patient_inter):.2f}")
print(f"4: inter-arrival time 3 {random.expovariate(1.0 / g.patient_inter):.2f}")
print(f"3: reception consult time  1 {random.expovariate(1.0 / g.mean_reception_time):.2f}")

1: inter-arrival time 1 5.10
2: inter-arrival time 2 0.13
4: inter-arrival time 3 1.61
3: reception consult time  1 0.51

We can see that the first two inter-arrival times are consistent. However, when we swap the order of generating the next inter-arrival time and generating a length of time for someone to spend with a receptionist, we see that the times are different.

Warning

So while this method is ok just to ensure that a single output remains consistent when you rerun your analysis, it’s no good for ensuring you’re making good comparisons across different simulation scenarios.

So how can we do this?

13.2 A robust way to ensure controlled randomness

Effectively, we want separate seeds for the random number generator for each separate type of event we are generating random numbers for.

This means that we have separate random number streams for the different parts of our process - our inter-arrival times - our consult times - our probabilities

The easiest way to implement this is to switch from using the random library to using the distributions provided in simtools.

We will replace random.expovariate with the Exponential class.

First, we need to import this class.

from sim_tools.distributions import Exponential

We now set up the distributions when initialising the model.

class Model:
    # Constructor to set up the model for a run.  We pass in a run number when
    # we create a new model.
    def __init__(self, run_number):
        # Create a SimPy environment in which everything will live
        self.env = simpy.Environment()

        # Create a patient counter (which we'll use as a patient ID)
        self.patient_counter = 0

        # Create our resources
        self.receptionist = simpy.Resource(
            self.env, capacity=g.number_of_receptionists
        )
        self.nurse = simpy.Resource(self.env, capacity=g.number_of_nurses)
        self.doctor = simpy.Resource(
            self.env, capacity=g.number_of_doctors)

        # Store the passed in run number
        self.run_number = run_number

        # Create a new Pandas DataFrame that will store some results against
        # the patient ID (which we'll use as the index).
        self.results_df = pd.DataFrame()
        self.results_df["Patient ID"] = [1]
        self.results_df["Q Time Recep"] = [0.0]
        self.results_df["Time with Recep"] = [0.0]
        self.results_df["Q Time Nurse"] = [0.0]
        self.results_df["Time with Nurse"] = [0.0]
        self.results_df["Q Time Doctor"] = [0.0]
        self.results_df["Time with Doctor"] = [0.0]
        self.results_df.set_index("Patient ID", inplace=True)

        # Create an attribute to store the mean queuing times across this run of
        # the model
        self.mean_q_time_recep = 0
        self.mean_q_time_nurse = 0
        self.mean_q_time_doctor = 0

        ##NEW - initialise distributions
        self.patient_inter_arrival_dist = Exponential(mean = g.patient_inter, random_seed = self.run_number*2)
        self.patient_reception_time_dist = Exponential(mean = g.mean_reception_time, random_seed = self.run_number*3)
        self.nurse_consult_time_dist = Exponential(mean = g.mean_n_consult_time, random_seed = self.run_number*4)
        self.doctor_consult_time_dist = Exponential(mean = g.mean_d_consult_time, random_seed = self.run_number*5)

Warning

Note that the value we pass to initialise the Exponential variable here is just the mean time.

When we were using random.expovariate, we passed 1 dividided by the mean time.

Next, everywhere we have previously used random.expovariate, we replace this with the .sample() method of our newly initialised distributions.

For example

sampled_doctor_act_time = random.expovariate(
    1.0 / g.mean_d_consult_time
)

becomes

sampled_doctor_act_time = self.doctor_consult_time_dist.sample()

13.2.1 The full code

Let’s update all the code and run our previous experiment again.

Click here to view the full code

import simpy
import random
import pandas as pd
from sim_tools.distributions import Exponential ##NEW

# Class to store global parameter values.  We don't create an instance of this
# class - we just refer to the class blueprint itself to access the numbers
# inside.
class g:
    patient_inter = 5
    mean_reception_time = 2
    mean_n_consult_time = 6
    mean_d_consult_time = 20
    number_of_receptionists = 1
    number_of_nurses = 1
    number_of_doctors = 2
    prob_seeing_doctor = 0.6
    sim_duration = 600
    number_of_runs = 100

# Class representing patients coming in to the clinic.
class Patient:
    def __init__(self, p_id):
        self.id = p_id
        self.q_time_recep = 0
        self.q_time_nurse = 0
        self.q_time_doctor = 0

# Class representing our model of the clinic.
class Model:
    # Constructor to set up the model for a run.  We pass in a run number when
    # we create a new model.
    def __init__(self, run_number):
        # Create a SimPy environment in which everything will live
        self.env = simpy.Environment()

        # Create a patient counter (which we'll use as a patient ID)
        self.patient_counter = 0

        # Create our resources
        self.receptionist = simpy.Resource(
            self.env, capacity=g.number_of_receptionists
        )
        self.nurse = simpy.Resource(self.env, capacity=g.number_of_nurses)
        self.doctor = simpy.Resource(
            self.env, capacity=g.number_of_doctors)

        # Store the passed in run number
        self.run_number = run_number

        # Create a new Pandas DataFrame that will store some results against
        # the patient ID (which we'll use as the index).
        self.results_df = pd.DataFrame()
        self.results_df["Patient ID"] = [1]
        self.results_df["Q Time Recep"] = [0.0]
        self.results_df["Time with Recep"] = [0.0]
        self.results_df["Q Time Nurse"] = [0.0]
        self.results_df["Time with Nurse"] = [0.0]
        self.results_df["Q Time Doctor"] = [0.0]
        self.results_df["Time with Doctor"] = [0.0]
        self.results_df.set_index("Patient ID", inplace=True)

        # Create an attribute to store the mean queuing times across this run of
        # the model
        self.mean_q_time_recep = 0
        self.mean_q_time_nurse = 0
        self.mean_q_time_doctor = 0

        self.patient_inter_arrival_dist = Exponential(mean = g.patient_inter, random_seed = self.run_number*2)
        self.patient_reception_time_dist = Exponential(mean = g.mean_reception_time, random_seed = self.run_number*3)
        self.nurse_consult_time_dist = Exponential(mean = g.mean_n_consult_time, random_seed = self.run_number*4)
        self.doctor_consult_time_dist = Exponential(mean = g.mean_d_consult_time, random_seed = self.run_number*5)

    # A generator function that represents the DES generator for patient
    # arrivals
    def generator_patient_arrivals(self):
        # We use an infinite loop here to keep doing this indefinitely whilst
        # the simulation runs
        while True:
            # Increment the patient counter by 1 (this means our first patient
            # will have an ID of 1)
            self.patient_counter += 1

            # Create a new patient - an instance of the Patient Class we
            # defined above.  Remember, we pass in the ID when creating a
            # patient - so here we pass the patient counter to use as the ID.
            p = Patient(self.patient_counter)

            # Tell SimPy to start up the attend_clinic generator function with
            # this patient (the generator function that will model the
            # patient's journey through the system)
            self.env.process(self.attend_clinic(p))

            # Randomly sample the time to the next patient arriving.  Here, we
            # sample from an exponential distribution (common for inter-arrival
            # times), and pass in a lambda value of 1 / mean.  The mean
            # inter-arrival time is stored in the g class.
            sampled_inter = self.patient_inter_arrival_dist.sample() ##NEW

            # Freeze this instance of this function in place until the
            # inter-arrival time we sampled above has elapsed.  Note - time in
            # SimPy progresses in "Time Units", which can represent anything
            # you like (just make sure you're consistent within the model)
            yield self.env.timeout(sampled_inter)

    # A generator function that represents the pathway for a patient going
    # through the clinic.
    # The patient object is passed in to the generator function so we can
    # extract information from / record information to it
    def attend_clinic(self, patient):
        start_q_recep = self.env.now

        with self.receptionist.request() as req:
            yield req

            end_q_recep = self.env.now

            patient.q_time_recep = end_q_recep - start_q_recep

            sampled_recep_act_time = self.patient_reception_time_dist.sample() ##NEW

            self.results_df.at[patient.id, "Q Time Recep"] = (
                 patient.q_time_recep
            )
            self.results_df.at[patient.id, "Time with Recep"] = (
                 sampled_recep_act_time
            )

            yield self.env.timeout(sampled_recep_act_time)

        # Here's where the patient finishes with the receptionist, and starts
        # queuing for the nurse

        # Record the time the patient started queuing for a nurse
        start_q_nurse = self.env.now

        # This code says request a nurse resource, and do all of the following
        # block of code with that nurse resource held in place (and therefore
        # not usable by another patient)
        with self.nurse.request() as req:
            # Freeze the function until the request for a nurse can be met.
            # The patient is currently queuing.
            yield req

            # When we get to this bit of code, control has been passed back to
            # the generator function, and therefore the request for a nurse has
            # been met.  We now have the nurse, and have stopped queuing, so we
            # can record the current time as the time we finished queuing.
            end_q_nurse = self.env.now

            # Calculate the time this patient was queuing for the nurse, and
            # record it in the patient's attribute for this.
            patient.q_time_nurse = end_q_nurse - start_q_nurse

            # Now we'll randomly sample the time this patient with the nurse.
            # Here, we use an Exponential distribution for simplicity, but you
            # would typically use a Log Normal distribution for a real model
            # (we'll come back to that).  As with sampling the inter-arrival
            # times, we grab the mean from the g class, and pass in 1 / mean
            # as the lambda value.
            sampled_nurse_act_time = self.nurse_consult_time_dist.sample() ##NEW

            # Here we'll store the queuing time for the nurse and the sampled
            # time to spend with the nurse in the results DataFrame against the
            # ID for this patient.  In real world models, you may not want to
            # bother storing the sampled activity times - but as this is a
            # simple model, we'll do it here.
            # We use a handy property of pandas called .at, which works a bit
            # like .loc.  .at allows us to access (and therefore change) a
            # particular cell in our DataFrame by providing the row and column.
            # Here, we specify the row as the patient ID (the index), and the
            # column for the value we want to update for that patient.
            self.results_df.at[patient.id, "Q Time Nurse"] = (
                patient.q_time_nurse)
            self.results_df.at[patient.id, "Time with Nurse"] = (
                sampled_nurse_act_time)

            # Freeze this function in place for the activity time we sampled
            # above.  This is the patient spending time with the nurse.
            yield self.env.timeout(sampled_nurse_act_time)

            # When the time above elapses, the generator function will return
            # here.  As there's nothing more that we've written, the function
            # will simply end.  This is a sink.  We could choose to add
            # something here if we wanted to record something - e.g. a counter
            # for number of patients that left, recording something about the
            # patients that left at a particular sink etc.

        # Conditional logic to see if patient goes on to see doctor
        # We sample from the uniform distribution between 0 and 1.  If the value
        # is less than the probability of seeing a doctor (stored in g Class)
        # then we say the patient sees a doctor.
        # If not, this block of code won't be run and the patient will just
        # leave the system (we could add in an else if we wanted a branching
        # path to another activity instead)
        if random.uniform(0,1) < g.prob_seeing_doctor:
            start_q_doctor = self.env.now

            with self.doctor.request() as req:
                yield req

                end_q_doctor = self.env.now

                patient.q_time_doctor = end_q_doctor - start_q_doctor

                sampled_doctor_act_time = self.nurse_consult_time_dist.sample() ##NEW

                self.results_df.at[patient.id, "Q Time Doctor"] = (
                    patient.q_time_doctor
                )
                self.results_df.at[patient.id, "Time with Doctor"] = (
                    sampled_doctor_act_time
                )

                yield self.env.timeout(sampled_doctor_act_time)

    # This method calculates results over a single run.  Here we just calculate
    # a mean, but in real world models you'd probably want to calculate more.
    def calculate_run_results(self):
        # Take the mean of the queuing times across patients in this run of the
        # model.
        self.mean_q_time_recep = self.results_df["Q Time Recep"].mean()
        self.mean_q_time_nurse = self.results_df["Q Time Nurse"].mean()
        self.mean_q_time_doctor = self.results_df["Q Time Doctor"].mean()

    # The run method starts up the DES entity generators, runs the simulation,
    # and in turns calls anything we need to generate results for the run
    def run(self):
        # Start up our DES entity generators that create new patients.  We've
        # only got one in this model, but we'd need to do this for each one if
        # we had multiple generators.
        self.env.process(self.generator_patient_arrivals())

        # Run the model for the duration specified in g class
        self.env.run(until=g.sim_duration)

        # Now the simulation run has finished, call the method that calculates
        # run results
        self.calculate_run_results()

        # Print the run number with the patient-level results from this run of
        # the model
        return (self.results_df)

# Class representing a Trial for our simulation - a batch of simulation runs.
class Trial:
    # The constructor sets up a pandas dataframe that will store the key
    # results from each run against run number, with run number as the index.
    def  __init__(self):
        self.df_trial_results = pd.DataFrame()
        self.df_trial_results["Run Number"] = [0]
        self.df_trial_results["Arrivals"] = [0]
        self.df_trial_results["Mean Q Time Recep"] = [0.0]
        self.df_trial_results["Mean Q Time Nurse"] = [0.0]
        self.df_trial_results["Mean Q Time Doctor"] = [0.0]
        self.df_trial_results.set_index("Run Number", inplace=True)

    # Method to print out the results from the trial.  In real world models,
    # you'd likely save them as well as (or instead of) printing them
    def print_trial_results(self):
        print ("Trial Results")
        print (self.df_trial_results.round(2))
        print(self.df_trial_results.mean().round(2))

    # Method to run a trial
    def run_trial(self):
        print(f"{g.number_of_receptionists} receptionists, {g.number_of_nurses} nurses, {g.number_of_doctors} doctors") ## NEW
        print("") ## NEW: Print a blank line
        # Run the simulation for the number of runs specified in g class.
        # For each run, we create a new instance of the Model class and call its
        # run method, which sets everything else in motion.  Once the run has
        # completed, we grab out the stored run results (just mean queuing time
        # here) and store it against the run number in the trial results
        # dataframe.
        for run in range(g.number_of_runs):
            random.seed(run)

            my_model = Model(run)
            patient_level_results = my_model.run()

            self.df_trial_results.loc[run] = [
                len(patient_level_results),
                my_model.mean_q_time_recep,
                my_model.mean_q_time_nurse,
                my_model.mean_q_time_doctor
                ]

        # Once the trial (ie all runs) has completed, print the final results
        self.print_trial_results()

13.2.2 Evaluating the outputs

1 receptionists, 1 nurses, 2 doctors

Trial Results
            Arrivals  Mean Q Time Recep  Mean Q Time Nurse  Mean Q Time Doctor
Run Number                                                                    
0              102.0               0.00              57.19                1.15
1              125.0               1.84             144.69                0.02
2              112.0               0.85              15.30                1.13
3              120.0               1.08              82.67                0.04
4              132.0               1.94             107.47                0.51
...              ...                ...                ...                 ...
95              91.0               0.35               9.08                0.15
96             122.0               1.09              63.11                0.32
97              97.0               0.58              21.86                0.10
98             133.0               1.86             124.94                0.09
99             122.0               0.57              63.14                0.66

[100 rows x 4 columns]
Arrivals              122.39
Mean Q Time Recep       1.27
Mean Q Time Nurse      61.35
Mean Q Time Doctor      0.48
dtype: float64

1 receptionists, 2 nurses, 2 doctors

Trial Results
            Arrivals  Mean Q Time Recep  Mean Q Time Nurse  Mean Q Time Doctor
Run Number                                                                    
0              102.0               0.00               2.49                1.01
1              125.0               1.84               3.55                1.56
2              112.0               0.85               3.61                0.59
3              120.0               1.08               3.43                0.19
4              132.0               1.94               4.20                0.91
...              ...                ...                ...                 ...
95              91.0               0.35               1.08                0.07
96             122.0               1.09               2.20                0.73
97              97.0               0.58               1.09                1.04
98             133.0               1.86               4.19                1.41
99             122.0               0.57               1.86                0.85

[100 rows x 4 columns]
Arrivals              122.39
Mean Q Time Recep       1.27
Mean Q Time Nurse       2.97
Mean Q Time Doctor      0.89
dtype: float64

With these changes made, we can see that the number of arrivals and the queue time for the receptionists across the trials has remained consistent, while the waits for nurses and doctors have changed, but we can now be confident that this is because of alterations to the parameters - not uncontrolled randomness.