29  Using code to solve facility location problems

Right - now we know the combinations we want to test and how long it takes to get to each clinic location, we’ll need some sort of demand figures.

29.1 Demand data

Chances are your demand isn’t spread evenly across your potential service user catchment area.

  • Maybe certain LSOAs have more people of childbearing age, so maternity and paediatric services will get more demand from those areas
  • Maybe certain LSOAs have an older population, so stroke services are more centred around there
  • Maybe other LSOAs have a higher incidence of obesity, so you might expect more demand for type 2 diabetes services and retinopathy clinics

Your demand data can be formatted quite simply!

You just want to make sure that you’re using the same level of location data as in your travel matrix (e.g. they need to both use LSOA, or both use postcode sectors - not a mixture)

Warning

You might use historic demand figures as your starting point.

But always consider…

  • If you seem to have areas with no historic demand, is this actually a pattern, or is it just that the historic numbers are small and there’s an element of chance?
  • Is there some preexisting bias or issue with access that you might be perpetuating by using the historic figures?

How can you deal with these issues?

If you’re using historic data, can you bring in more years?

Alternatively, you could explore what happens when you just weight demand by the number of people in each LSOA.

If your service only covers a certain age range, you can find LSOA-level figures estimating the proportion of the population in that age group.

You might also just make predictions of demand based off something else. - Number of people in a target age demographic or of a particular gender in your LSOAs, multiplied by some factor - (e.g. 20% of women aged 18-34)

Just always consider whether what you’re doing is

  • appropriate
  • sensible
  • transparent
  • justifiable

Make it clear what data you have used and why!

Make the limitations of your conclusions explicit.

29.2 Evaluating the weighted average travel time

Right! We’ve got everything we need - except a way to evaluate the weighted average travel time…

Cast your mind back many many slides - when we present our options, we want to give more weight to the travel times from areas that have higher demand.

Let’s write some code to do this.

It’s something we’re going to want to reuse quite a lot - so we can turn it into a class with…

Attributes

  • our travel dataframe
  • our demand dataframe

Methods

  • Evaluate a solution
    • Limit the travel dataframe to a subset of candidate locations
    • Work out the smallest value per location (the ‘minimum cost’)
    • Return a dataframe with demand and min cost per region
  • Return the weighted average for the solution
  • Return the unweighted average for the solution
  • Return the maximum travel time for any locations in the solution

The FacilityLocationObjective code is modified from the Metapy library created by Tom Monks

https://github.com/health-data-science-OR/healthcare-logistics/tree/master/optimisation/metapy

29.2.0.1 The init method

class FacilityLocationObjective:
    def __init__(self, demand, travel_matrix, merge_col, demand_col):
        '''
        Store the demand and travel times

        Args:
            demand: pd.DataFrame:

            travel_matrix: pd.DataFrame:
        '''
        self.demand = demand.set_index(merge_col)
        self.travel_matrix = travel_matrix.set_index(merge_col)
        self.demand_col = demand_col

demand is our dataframe containing demand sources (e.g. LSOAs) as rows and the demand (e.g. yearly patients coming from that LSOA) as the single column

merge_col is the name of the column that is consistent between our demand and travel dataframes

travel_matrix is the dataframe with demand sources as rows and potential service locations as columns, with the travel cost (time) as the values in the cells

demand_col is a string referring to the column that contains the demand value in the demand dataframe

29.2.0.2 The evaluate_solution method

This method, given a list of sites in the form of their column indices (e.g. [1, 4, 5] for sites 2, 5 and 6), will return a dataframe with the demand and minimum travel time per row for this potential solution.

    def evaluate_solution(self, site_list):
        '''
        Args:
            site_list: list: column indices of solution to evaluate
                            (to apply to travel matrix)

        Returns:
            Pandas dataframe to pass to evaluation functions

        '''

        active_facilities = self.travel_matrix.iloc[:, site_list].copy()

        # Assume travel to closest facility
        # Need to drop the column that contains
        active_facilities['min_cost'] = active_facilities.min(axis=1)


        # Merge demand and travel times into a single DataFrame
        problem = self.demand.merge(active_facilities,
                                    left_index=True, right_index=True,
                                    how='inner')

        return problem.reset_index()

29.2.0.3 The generate_solution_metrics method

This method, given a list of sites in the form of their column indices (e.g. [1, 4, 5] for sites 2, 5 and 6) - Runs the evaluate_solution method - Returns a range of metrics relating to the

    def generate_solution_metrics(self, site_list):
        '''
        Calculates the weighted average travel time for selected sites

        Args:
            site_list: list or np.array: A list of site IDs as a list or array (e.g. [0, 3, 4])
            merge_col: string: The column name to use for merging the data.
            n_patients_or_referrals_col: string: The column name to use for the number of patients or referrals.

        Returns:
            A tuple containing the problem and the maximum travel time.
        '''
        problem = self.evaluate_solution(site_list)

        # Return weighted average
        weighted_average = np.average(problem['min_cost'], weights=problem[self.demand_col])
        unweighted_average = np.average(problem['min_cost'])
        max_travel = np.max(problem['min_cost'])

        return {
            'site_indices': site_list,
            'site_names': ", ".join(self.travel_matrix.columns[site_list].tolist()),
            'weighted_average': weighted_average,
            'unweighted_average': unweighted_average,
            'max': max_travel,
            'problem_df': problem
        }

:::{.callout-tip collapse=“True”} ### Click here to view the whole class

# Tweaked WeightedAverageObjective from Metapy package
# https://github.com/health-data-science-OR/healthcare-logistics/tree/master/optimisation/metapy
# Credit: Tom Monks

class FacilityLocationObjective:
    '''
    Encapsulates logic for calculation of
    metrics in a simple facility location problem

    Demand and travel matrices must have a common column

    demand: pd.dataframe:  Two column dataframe. One column should be labels for the
    demand locations (e.g. LSOA identifiers, postcodes). Second column should contain
    demand figures of some kind (e.g. number of historical cases)
    If demand assumed to be equal, all values in this column could be 1.

    travel_matrix: pd.dataframe: dataframe with columns representing sites
    and rows representing locations demand will come from.
    One column should be labels for the demand locations (e.g. LSOA identifiers, postcodes).
    All other values will be either distance or time in float form.
    No additional columns of information must be included or they will be used as part of the
    calculation of the lowest-cost solution, which may lead to incorrect results.
    '''
    def __init__(self, demand, travel_matrix, merge_col, demand_col):
        '''
        Store the demand and travel times

        Args:
            demand: pd.DataFrame:

            travel_matrix: pd.DataFrame:
        '''
        self.demand = demand.set_index(merge_col)
        self.travel_matrix = travel_matrix.set_index(merge_col)
        self.demand_col = demand_col


    def evaluate_solution(self, site_list):
        '''
        Calculates the

        Args:
            site_list: list: column indices of solution to evaluate
                            (to apply to travel matrix)

        Returns:
            Pandas dataframe to pass to evaluation functions

        '''

        active_facilities = self.travel_matrix.iloc[:, site_list].copy()

        # Assume travel to closest facility
        # Need to drop the column that contains
        active_facilities['min_cost'] = active_facilities.min(axis=1)


        # Merge demand and travel times into a single DataFrame
        problem = self.demand.merge(active_facilities,
                                    left_index=True, right_index=True,
                                    how='inner')

        return problem.reset_index()


    def generate_solution_metrics(self, site_list):
        '''
        Calculates the weighted average travel time for selected sites

        Args:
            site_list: list or np.array: A list of site IDs as a list or array (e.g. [0, 3, 4])
            merge_col: string: The column name to use for merging the data.
            n_patients_or_referrals_col: string: The column name to use for the number of patients or referrals.

        Returns:
            A tuple containing the problem and the maximum travel time.
        '''
        problem = self.evaluate_solution(site_list)

        # Return weighted average
        weighted_average = np.average(problem['min_cost'], weights=problem[self.demand_col])
        unweighted_average = np.average(problem['min_cost'])
        max_travel = np.max(problem['min_cost'])

        return {
            'site_indices': site_list,
            'site_names': ", ".join(self.travel_matrix.columns[site_list].tolist()),
            'weighted_average': weighted_average,
            'unweighted_average': unweighted_average,
            'max': max_travel,
            'problem_df': problem
        }

29.2.1 Setting up an instance of this class

Let’s first import the packages we will need throughout this section.

import pandas as pd
import geopandas
import contextily as cx
import matplotlib.pyplot as plt

We need to - import a dataset of demand - import (or create) a geodataframe of site locations - and import (or create) a travel matrix for this combination of sites and demand sources

brighton_demand = pd.read_csv("https://raw.githubusercontent.com/hsma-programme/h6_3d_facility_location_problems/main/h6_3d_facility_location_problems/example_code/brighton_demand.csv").drop(columns=["Unnamed: 0"])
brighton_demand.head()
LSOA demand
0 Brighton and Hove 027E 3627
1 Brighton and Hove 027F 2323
2 Brighton and Hove 027A 2596
3 Brighton and Hove 029E 3132
4 Brighton and Hove 029D 2883
brighton_sites = geopandas.read_file("https://raw.githubusercontent.com/hsma-programme/h6_3d_facility_location_problems/main/h6_3d_facility_location_problems/example_code/brighton_sites.geojson")
brighton_sites.head()
site geometry
0 Site 1 POINT (-0.19544 50.84511)
1 Site 2 POINT (-0.13365 50.84435)
2 Site 3 POINT (-0.10763 50.83347)
3 Site 4 POINT (-0.17652 50.83075)
4 Site 5 POINT (-0.11961 50.86597)
brighton_travel = pd.read_csv("https://raw.githubusercontent.com/hsma-programme/h6_3d_facility_location_problems/main/h6_3d_facility_location_problems/example_code/brighton_travel_matrix_driving.csv").drop(columns=["Unnamed: 0"])
brighton_travel.head()
LSOA Site 1 Site 2 Site 3 Site 4 Site 5 Site 6
0 Brighton and Hove 027E 773.93 527.69 444.29 491.85 607.54 554.91
1 Brighton and Hove 027F 757.39 499.11 517.57 561.07 578.97 538.37
2 Brighton and Hove 027A 763.24 601.38 517.98 410.40 681.23 557.35
3 Brighton and Hove 029E 743.62 651.72 660.36 379.72 731.58 557.58
4 Brighton and Hove 029D 665.85 664.65 658.20 313.00 744.50 570.51

Finally, we bring in our LSOA boundaries for plotting later.

lsoa_boundaries = geopandas.read_file("https://raw.githubusercontent.com/hsma-programme/h6_3d_facility_location_problems/main/h6_3d_facility_location_problems/example_code/LSOA_2011_Boundaries_Super_Generalised_Clipped_BSC_EW_V4.geojson")
lsoa_boundaries.head()
FID LSOA11CD LSOA11NM LSOA11NMW BNG_E BNG_N LONG LAT GlobalID geometry
0 1 E01000001 City of London 001A City of London 001A 532123 181632 -0.097140 51.51816 a758442e-7679-45d0-95a8-ed4c968ecdaa POLYGON ((532282.629 181906.496, 532248.250 18...
1 2 E01000002 City of London 001B City of London 001B 532480 181715 -0.091970 51.51882 861dbb53-dfaf-4f57-be96-4527e2ec511f POLYGON ((532746.814 181786.892, 532248.250 18...
2 3 E01000003 City of London 001C City of London 001C 532239 182033 -0.095320 51.52174 9f765b55-2061-484a-862b-fa0325991616 POLYGON ((532293.068 182068.422, 532419.592 18...
3 4 E01000005 City of London 001E City of London 001E 533581 181283 -0.076270 51.51468 a55c4c31-ef1c-42fc-bfa9-07c8f2025928 POLYGON ((533604.245 181418.129, 533743.689 18...
4 5 E01000006 Barking and Dagenham 016A Barking and Dagenham 016A 544994 184274 0.089317 51.53875 9cdabaa8-d9bd-4a94-bb3b-98a933ceedad POLYGON ((545271.918 184183.948, 545296.314 18...

We will then pass the travel and demand datasets into our FacilityLocationObjective; the site df will be used afterwards.

location_problem_brighton = FacilityLocationObjective(
    demand=brighton_demand,
    travel_matrix=brighton_travel,
    merge_col="LSOA",
    demand_col="demand"
    )

29.2.2 Outputs

29.2.2.1 evaluate_solution

This is the output of the evaluate_solution method.

And now we can easily pass in a range of solutions - including with different numbers of sites.

Note the column names changing to reflect the site indices we are passing in.

Now we bring in our code from an earlier chapter to calculate every possible combination of a certain number of facilities.

from itertools import combinations
import numpy as np

def all_combinations(n_facilities, p):
    facility = np.arange(n_facilities, dtype=np.uint8)
    return [np.array(a) for a in combinations(facility, p)]

We could even loop through this to find every combination with every possible number of facilities…

possible_combinations_brighton = all_combinations(
    len(location_problem_brighton.travel_matrix.columns),
    4
    )

possible_combinations_brighton
[array([0, 1, 2, 3], dtype=uint8),
 array([0, 1, 2, 4], dtype=uint8),
 array([0, 1, 2, 5], dtype=uint8),
 array([0, 1, 3, 4], dtype=uint8),
 array([0, 1, 3, 5], dtype=uint8),
 array([0, 1, 4, 5], dtype=uint8),
 array([0, 2, 3, 4], dtype=uint8),
 array([0, 2, 3, 5], dtype=uint8),
 array([0, 2, 4, 5], dtype=uint8),
 array([0, 3, 4, 5], dtype=uint8),
 array([1, 2, 3, 4], dtype=uint8),
 array([1, 2, 3, 5], dtype=uint8),
 array([1, 2, 4, 5], dtype=uint8),
 array([1, 3, 4, 5], dtype=uint8),
 array([2, 3, 4, 5], dtype=uint8)]

29.2.3 Obtaining solution metrics in a loop

Now we can loop through every possible combination and save the outputs.

outputs = []

for possible_solution in possible_combinations_brighton:
    outputs.append(
        location_problem_brighton.generate_solution_metrics(
        possible_solution
    )
)

And easily turn our list of dictionaries into a dataframe!

pd.DataFrame(outputs)
site_indices site_names weighted_average unweighted_average max problem_df
0 [0, 1, 2, 3] Site 1, Site 2, Site 3, Site 4 313.257086 312.565758 1001.33 LSOA demand Site 1 S...
1 [0, 1, 2, 4] Site 1, Site 2, Site 3, Site 5 374.295378 370.452061 1001.33 LSOA demand Site 1 S...
2 [0, 1, 2, 5] Site 1, Site 2, Site 3, Site 6 368.551505 355.460000 1001.33 LSOA demand Site 1 S...
3 [0, 1, 3, 4] Site 1, Site 2, Site 4, Site 5 397.907659 385.938061 1302.58 LSOA demand Site 1 S...
4 [0, 1, 3, 5] Site 1, Site 2, Site 4, Site 6 410.033802 388.522909 1371.46 LSOA demand Site 1 S...
5 [0, 1, 4, 5] Site 1, Site 2, Site 5, Site 6 460.215392 436.115273 1302.58 LSOA demand Site 1 S...
6 [0, 2, 3, 4] Site 1, Site 3, Site 4, Site 5 304.810704 306.939576 1001.33 LSOA demand Site 1 ...
7 [0, 2, 3, 5] Site 1, Site 3, Site 4, Site 6 313.630625 311.418303 1001.33 LSOA demand Site 1 ...
8 [0, 2, 4, 5] Site 1, Site 3, Site 5, Site 6 363.242023 354.221818 1001.33 LSOA demand Site 1 ...
9 [0, 3, 4, 5] Site 1, Site 4, Site 5, Site 6 414.657901 393.878606 1302.58 LSOA demand Site 1 S...
10 [1, 2, 3, 4] Site 2, Site 3, Site 4, Site 5 316.865395 319.753939 1001.33 LSOA demand Site 2 ...
11 [1, 2, 3, 5] Site 2, Site 3, Site 4, Site 6 311.756166 304.151576 1001.33 LSOA demand Site 2 ...
12 [1, 2, 4, 5] Site 2, Site 3, Site 5, Site 6 379.697853 370.415455 1001.33 LSOA demand Site 2 ...
13 [1, 3, 4, 5] Site 2, Site 4, Site 5, Site 6 399.681921 382.101333 1302.58 LSOA demand Site 2 S...
14 [2, 3, 4, 5] Site 3, Site 4, Site 5, Site 6 306.446684 302.913394 1001.33 LSOA demand Site 3 ...

Then it’s easy to find the best combinations and tidy up the output table.

pd.DataFrame(outputs).sort_values('weighted_average').round(1)
site_indices site_names weighted_average unweighted_average max problem_df
6 [0, 2, 3, 4] Site 1, Site 3, Site 4, Site 5 304.8 306.9 1001.3 LSOA demand Site 1 ...
14 [2, 3, 4, 5] Site 3, Site 4, Site 5, Site 6 306.4 302.9 1001.3 LSOA demand Site 3 ...
11 [1, 2, 3, 5] Site 2, Site 3, Site 4, Site 6 311.8 304.2 1001.3 LSOA demand Site 2 ...
0 [0, 1, 2, 3] Site 1, Site 2, Site 3, Site 4 313.3 312.6 1001.3 LSOA demand Site 1 S...
7 [0, 2, 3, 5] Site 1, Site 3, Site 4, Site 6 313.6 311.4 1001.3 LSOA demand Site 1 ...
10 [1, 2, 3, 4] Site 2, Site 3, Site 4, Site 5 316.9 319.8 1001.3 LSOA demand Site 2 ...
8 [0, 2, 4, 5] Site 1, Site 3, Site 5, Site 6 363.2 354.2 1001.3 LSOA demand Site 1 ...
2 [0, 1, 2, 5] Site 1, Site 2, Site 3, Site 6 368.6 355.5 1001.3 LSOA demand Site 1 S...
1 [0, 1, 2, 4] Site 1, Site 2, Site 3, Site 5 374.3 370.5 1001.3 LSOA demand Site 1 S...
12 [1, 2, 4, 5] Site 2, Site 3, Site 5, Site 6 379.7 370.4 1001.3 LSOA demand Site 2 ...
3 [0, 1, 3, 4] Site 1, Site 2, Site 4, Site 5 397.9 385.9 1302.6 LSOA demand Site 1 S...
13 [1, 3, 4, 5] Site 2, Site 4, Site 5, Site 6 399.7 382.1 1302.6 LSOA demand Site 2 S...
4 [0, 1, 3, 5] Site 1, Site 2, Site 4, Site 6 410.0 388.5 1371.5 LSOA demand Site 1 S...
9 [0, 3, 4, 5] Site 1, Site 4, Site 5, Site 6 414.7 393.9 1302.6 LSOA demand Site 1 S...
5 [0, 1, 4, 5] Site 1, Site 2, Site 5, Site 6 460.2 436.1 1302.6 LSOA demand Site 1 S...

And, since we stored the combinations in our dataframe too, we can easily pull this out by chaining together a few steps.

pd.DataFrame(outputs).sort_values('weighted_average').head(1)['site_names'].values
array(['Site 1, Site 3, Site 4, Site 5'], dtype=object)

29.2.4 Plotting the best solution

We can now put this all together with the map plotting skills we’ve learned over the last few sessions.

When plotting our sites as well, we can ensure we are only plotting the sites included in our solution - see the highlighted sections of the code below.

Let’s explore what the output of the first line of code is:

best_solution = pd.DataFrame(outputs).sort_values('weighted_average').head(1)
best_solution_df = best_solution['problem_df'].values[0]

best_solution_df
LSOA demand Site 1 Site 3 Site 4 Site 5 min_cost
0 Brighton and Hove 027E 3627 773.93 444.29 491.85 607.54 444.29
1 Brighton and Hove 027F 2323 757.39 517.57 561.07 578.97 517.57
2 Brighton and Hove 027A 2596 763.24 517.98 410.40 681.23 410.40
3 Brighton and Hove 029E 3132 743.62 660.36 379.72 731.58 379.72
4 Brighton and Hove 029D 2883 665.85 658.20 313.00 744.50 313.00
... ... ... ... ... ... ... ...
160 Brighton and Hove 012A 2497 446.54 1108.11 519.16 626.02 446.54
161 Brighton and Hove 005C 2570 469.80 1008.24 569.40 526.15 469.80
162 Brighton and Hove 012B 2051 464.53 1132.60 537.15 650.51 464.53
163 Brighton and Hove 005A 1164 567.49 1105.93 664.11 623.84 567.49
164 Brighton and Hove 005B 1097 495.50 1033.94 595.10 551.85 495.50

165 rows × 7 columns

This line then ensures we’re just pulling out the site indices for the best sites, and filtering our site dataframe to just those.

brighton_sites_bng = brighton_sites.to_crs('EPSG:27700').iloc[best_solution["site_indices"].values[0]]

brighton_sites_bng
site geometry
0 Site 1 POINT (527142.275 106616.053)
2 Site 3 POINT (533356.778 105476.782)
3 Site 4 POINT (528513.424 105052.430)
4 Site 5 POINT (532421.163 109069.196)

Now let’s do the plotting.

best_solution_df = pd.DataFrame(outputs).sort_values('weighted_average').head(1)['problem_df'].values[0]

nearest_site_travel_brighton_gdf = pd.merge(
    lsoa_boundaries,
    best_solution_df,
    right_on = "LSOA",
    left_on = "LSOA11NM"
)

nearest_site_travel_brighton_gdf["min_cost_minutes"] = nearest_site_travel_brighton_gdf["min_cost"] / 60

ax = nearest_site_travel_brighton_gdf.plot(
    "min_cost_minutes",
    legend=True,
    cmap="Blues",
    alpha=0.7,
    edgecolor="black",
    linewidth=0.5,
    figsize=(12,6)
    )

brighton_sites_bng = brighton_sites.to_crs('EPSG:27700').iloc[best_solution["site_indices"].values[0]]

hospital_points = brighton_sites_bng.plot(ax=ax, color='magenta', markersize=60)

cx.add_basemap(ax, crs=nearest_site_travel_brighton_gdf.crs.to_string(), zoom=14)

for x, y, label in zip(brighton_sites_bng.geometry.x,
                      brighton_sites_bng.geometry.y,
                      brighton_sites_bng.site):
    ax.annotate(label, xy=(x,y), xytext=(10,3), textcoords="offset points", bbox=dict(facecolor='white'))

ax.axis('off')

plt.title("Travel Time (driving - minutes) for best sites in Brighton")
Text(0.5, 1.0, 'Travel Time (driving - minutes) for best sites in Brighton')

29.2.5 Plotting all solutions

And then it doesn’t take much effort to create a map showing weighted average travel times for every possible solution!

The key thing is that we just iterate through our dataframe of all solutions - pulling one row out at a time.

If we order them by weighted average first, the map in the top left is the best solution.

Then it’s just our usual plot code - making sure to specify the axis we are plotting on to.

Another benefit of saving all these different things is that we can then easily add the weighted average travel time to the title of each plot!

fig, axs = plt.subplots(3, 5, figsize=(30, 15))

for i, ax in enumerate(fig.axes):

    solution = pd.DataFrame(outputs).sort_values('weighted_average').iloc[[i]]
    solution_df = solution['problem_df'].values[0]

    nearest_site_travel_brighton_gdf = pd.merge(
        lsoa_boundaries,
        solution_df,
        right_on = "LSOA",
        left_on = "LSOA11NM"
    )

    nearest_site_travel_brighton_gdf["min_cost_minutes"] = nearest_site_travel_brighton_gdf["min_cost"] / 60

    ax = nearest_site_travel_brighton_gdf.plot(
        "min_cost_minutes",
        legend=True,
        cmap="Blues",
        alpha=0.7,
        edgecolor="black",
        linewidth=0.5,
        figsize=(12,6),
        ax=ax
        )

    brighton_sites_bng = brighton_sites.to_crs('EPSG:27700').iloc[solution["site_indices"].values[0]]

    hospital_points = brighton_sites_bng.plot(ax=ax, color='magenta', markersize=60)

    for x, y, label in zip(brighton_sites_bng.geometry.x,
                        brighton_sites_bng.geometry.y,
                        brighton_sites_bng.site):
        ax.annotate(label, xy=(x,y), xytext=(10,3), textcoords="offset points", bbox=dict(facecolor='white'))

    ax.axis('off')

    weighted_travel_time_solution = solution["weighted_average"].values[0]
    weighted_travel_time_solution_minutes = (weighted_travel_time_solution / 60).round(2)

    ax.set_title(f"Weighted Average:\n{weighted_travel_time_solution_minutes} minutes")

29.2.6 Other ways to display the output

Consider other ways to display the outputs too.

import plotly.express as px

outputs_df = pd.DataFrame(outputs)
outputs_df['weighted_average_minutes'] = (outputs_df['weighted_average']/60).round(2)

px.bar(
    data_frame=outputs_df.sort_values("weighted_average_minutes", ascending=False),
    y="site_names",
    x="weighted_average_minutes",
    title="Possible Site Combinations"
)

29.3 Full Code Example

Finally, here is a full copyable code example for this section.

from itertools import combinations
import numpy as np
import pandas as pd
import geopandas
import contextily as cx
import matplotlib.pyplot as plt

# Tweaked WeightedAverageObjective from Metapy package
# https://github.com/health-data-science-OR/healthcare-logistics/tree/master/optimisation/metapy
# Credit: Tom Monks

class FacilityLocationObjective:
    '''
    Encapsulates logic for calculation of
    metrics in a simple facility location problem

    Demand and travel matrices must have a common column

    demand: pd.dataframe:  Two column dataframe. One column should be labels for the
    demand locations (e.g. LSOA identifiers, postcodes). Second column should contain
    demand figures of some kind (e.g. number of historical cases)
    If demand assumed to be equal, all values in this column could be 1.

    travel_matrix: pd.dataframe: dataframe with columns representing sites
    and rows representing locations demand will come from.
    One column should be labels for the demand locations (e.g. LSOA identifiers, postcodes).
    All other values will be either distance or time in float form.
    No additional columns of information must be included or they will be used as part of the
    calculation of the lowest-cost solution, which may lead to incorrect results.
    '''
    def __init__(self, demand, travel_matrix, merge_col, demand_col):
        '''
        Store the demand and travel times

        Args:
            demand: pd.DataFrame:

            travel_matrix: pd.DataFrame:
        '''
        self.demand = demand.set_index(merge_col)
        self.travel_matrix = travel_matrix.set_index(merge_col)
        self.demand_col = demand_col


    def evaluate_solution(self, site_list):
        '''
        Calculates the

        Args:
            site_list: list: column indices of solution to evaluate
                            (to apply to travel matrix)

        Returns:
            Pandas dataframe to pass to evaluation functions

        '''

        active_facilities = self.travel_matrix.iloc[:, site_list].copy()

        # Assume travel to closest facility
        # Need to drop the column that contains
        active_facilities['min_cost'] = active_facilities.min(axis=1)


        # Merge demand and travel times into a single DataFrame
        problem = self.demand.merge(active_facilities,
                                    left_index=True, right_index=True,
                                    how='inner')

        return problem.reset_index()


    def generate_solution_metrics(self, site_list):
        '''
        Calculates the weighted average travel time for selected sites

        Args:
            site_list: list or np.array: A list of site IDs as a list or array (e.g. [0, 3, 4])
            merge_col: string: The column name to use for merging the data.
            n_patients_or_referrals_col: string: The column name to use for the number of patients or referrals.

        Returns:
            A tuple containing the problem and the maximum travel time.
        '''
        problem = self.evaluate_solution(site_list)

        # Return weighted average
        weighted_average = np.average(problem['min_cost'], weights=problem[self.demand_col])
        unweighted_average = np.average(problem['min_cost'])
        max_travel = np.max(problem['min_cost'])

        return {
            'site_indices': site_list,
            'site_names': ", ".join(self.travel_matrix.columns[site_list].tolist()),
            'weighted_average': weighted_average,
            'unweighted_average': unweighted_average,
            'max': max_travel,
            'problem_df': problem
        }

def all_combinations(n_facilities, p):
    facility = np.arange(n_facilities, dtype=np.uint8)
    return [np.array(a) for a in combinations(facility, p)]

brighton_demand = pd.read_csv("https://raw.githubusercontent.com/hsma-programme/h6_3d_facility_location_problems/main/h6_3d_facility_location_problems/example_code/brighton_demand.csv").drop(columns=["Unnamed: 0"])

brighton_sites = geopandas.read_file("https://raw.githubusercontent.com/hsma-programme/h6_3d_facility_location_problems/main/h6_3d_facility_location_problems/example_code/brighton_sites.geojson")

brighton_travel = pd.read_csv("https://raw.githubusercontent.com/hsma-programme/h6_3d_facility_location_problems/main/h6_3d_facility_location_problems/example_code/brighton_travel_matrix_driving.csv").drop(columns=["Unnamed: 0"])

lsoa_boundaries = geopandas.read_file("https://raw.githubusercontent.com/hsma-programme/h6_3d_facility_location_problems/main/h6_3d_facility_location_problems/example_code/LSOA_2011_Boundaries_Super_Generalised_Clipped_BSC_EW_V4.geojson")

location_problem_brighton = FacilityLocationObjective(
    demand=brighton_demand,
    travel_matrix=brighton_travel,
    merge_col="LSOA",
    demand_col="demand"
    )

possible_combinations_brighton = all_combinations(
    len(location_problem_brighton.travel_matrix.columns),
    4
    )

outputs = []

for possible_solution in possible_combinations_brighton:
    outputs.append(
        location_problem_brighton.generate_solution_metrics(
        possible_solution
    )
)

best_solution = pd.DataFrame(outputs).sort_values('weighted_average').head(1)
best_solution_df = best_solution['problem_df'].values[0]

fig, axs = plt.subplots(3, 5, figsize=(30, 15))

for i, ax in enumerate(fig.axes):

    solution = pd.DataFrame(outputs).sort_values('weighted_average').iloc[[i]]
    solution_df = solution['problem_df'].values[0]

    nearest_site_travel_brighton_gdf = pd.merge(
        lsoa_boundaries,
        solution_df,
        right_on = "LSOA",
        left_on = "LSOA11NM"
    )

    nearest_site_travel_brighton_gdf["min_cost_minutes"] = nearest_site_travel_brighton_gdf["min_cost"] / 60

    ax = nearest_site_travel_brighton_gdf.plot(
        "min_cost_minutes",
        legend=True,
        cmap="Blues",
        alpha=0.7,
        edgecolor="black",
        linewidth=0.5,
        figsize=(12,6),
        ax=ax
        )

    brighton_sites_bng = brighton_sites.to_crs('EPSG:27700').iloc[solution["site_indices"].values[0]]

    hospital_points = brighton_sites_bng.plot(ax=ax, color='magenta', markersize=60)

    for x, y, label in zip(brighton_sites_bng.geometry.x,
                        brighton_sites_bng.geometry.y,
                        brighton_sites_bng.site):
        ax.annotate(label, xy=(x,y), xytext=(10,3), textcoords="offset points", bbox=dict(facecolor='white'))

    ax.axis('off')

    weighted_travel_time_solution = solution["weighted_average"].values[0]
    weighted_travel_time_solution_minutes = (weighted_travel_time_solution / 60).round(2)

    ax.set_title(f"Weighted Average:\n{weighted_travel_time_solution_minutes} minutes")