Reproduction of: Rapidly measuring spatial accessibility of COVID-19 healthcare resources: a case study of Illinois, USA
Original study by Kang, J. Y., A. Michels, F. Lyu, Shaohua Wang, N. Agbodo, V. L. Freeman, and Shaowen Wang. 2020. Rapidly measuring spatial accessibility of COVID-19 healthcare resources: a case study of Illinois, USA. International Journal of Health Geographics 19 (1):1–17. DOI:10.1186/s12942-020-00229-x.
Reproduction Authors: Joe Holler, Derrick Burt, and Kufre Udoh With contributions from Peter Kedron, Drew An-Pham, and the Spring 2021 Open Source GIScience class at Middlebury, Isaiah Bennett
Reproduction Materials Available at: github.com/HEGSRR/RPr-Kang-2020
Created: 2021-06-01
Revised: 2023-11-09
The original Kang et al. (2020) is a network analysis that investigates the access to COVID-19 related healthcare resources in the state of Illinois. This network analysis uses hospital point location data with their capacity represented by ICU beds, and ventilators, OSM road networks, population data with the at risk population represented by ages over 50, and COVID19 case data. First they calculate catchment areas for each hospital to show distances for where the hospital is accessible within radii of 10, 20, and 30 minutes assigning them weights of 1, 0.68, and 0.22 respectively. Centroid points are then created for both the at risk population data and COVID-19 cases. They next calculate a weighted service ratio by executing a spatial join with the population centroids and the weighted hospital catchment polygons. The weighted service ratios are then aggregated to a grid of hexagons that is rescaled on a spectrum of 0 to 1 where 1 is the greatest accessibility to healthcare resources. Our original reproduction of Kang et al. (2020) was partially successful in reproducing the results of the Kang et al. (2020) by focusing on only the extent of the city of Chicago instead of all of Illinois to adapt to the storage capacity of Github repositories since the files for all of the state were too large. While our reproduction did have a Spearman's Rho of 0.532 it identified areas for improvement which this reanalysis focuses on. First is fixing the boundary issue by buffering the extent of the road network to expand outside the city to include any roads or populations that might access hospitals within the city boundaries. Second is to improve the uncertainty from generalizing all roads with unknown speed limits to 35mph, by using speed limits by road types from the osmnx package. Lastly, they identified the opportunity to improve accuracy by changing the method of aggregating data into the hexagons to an area weighted reaggregation instead of using the threshold of measuring if the service polygons cover over 50% of the hexagon.
This reanalysis makes three big changes to improve reproducibility and develop scientific discovery:
To perform the ESFCA method, three types of data are required, as follows: (1) road network, (2) population, and (3) hospital information. The road network can be obtained from the OpenStreetMap Python Library, called OSMNX. The population data is available on the American Community Survey. Lastly, hospital information is also publically available on the Homelanad Infrastructure Foundation-Level Data.
Import necessary libraries to run this model.
See environment.yml
for the library versions used for this analysis.
# Import modules
import numpy as np
import pandas as pd
import geopandas as gpd
import networkx as nx
import osmnx as ox
import re
from shapely.geometry import Point, LineString, Polygon
import matplotlib.pyplot as plt
from tqdm import tqdm
import multiprocessing as mp
import folium
import itertools
import os
import time
import warnings
import IPython
import requests
from IPython.display import display, clear_output
warnings.filterwarnings("ignore")
print('\n'.join(f'{m.__name__}=={m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))
numpy==1.22.0 pandas==1.3.5 geopandas==0.10.2 networkx==2.6.3 osmnx==1.1.2 re==2.2.1 folium==0.12.1.post1 IPython==8.3.0 requests==2.27.1
Because we have restructured the repository for replication, we need to check our working directory and make necessary adjustments.
# Check working directory
os.getcwd()
'/home/jovyan/work/RPr-Kang-2020/procedure/code'
# Use to set work directory properly
if os.path.basename(os.getcwd()) == 'code':
os.chdir('../../')
os.getcwd()
'/home/jovyan/work/RPr-Kang-2020'
If you would like to use the data generated from the pre-processing scripts, use the following code:
covid_data = gpd.read_file('./data/raw/public/Pre-Processing/covid_pre-processed.shp')
atrisk_data = gpd.read_file('./data/raw/public/Pre-Processing/atrisk_pre-processed.shp')
# Read in at risk population data
atrisk_data = gpd.read_file('./data/raw/public/PopData/Illinois_Tract.shp')
atrisk_data.head()
GEOID | STATEFP | COUNTYFP | TRACTCE | NAMELSAD | Pop | Unnamed_ 0 | NAME | OverFifty | TotalPop | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 17091011700 | 17 | 091 | 011700 | Census Tract 117 | 3688 | 588 | Census Tract 117, Kankakee County, Illinois | 1135 | 3688 | POLYGON ((-87.88768 41.13594, -87.88764 41.136... |
1 | 17091011800 | 17 | 091 | 011800 | Census Tract 118 | 2623 | 220 | Census Tract 118, Kankakee County, Illinois | 950 | 2623 | POLYGON ((-87.89410 41.14388, -87.89400 41.143... |
2 | 17119400951 | 17 | 119 | 400951 | Census Tract 4009.51 | 5005 | 2285 | Census Tract 4009.51, Madison County, Illinois | 2481 | 5005 | POLYGON ((-90.11192 38.70281, -90.11128 38.703... |
3 | 17119400952 | 17 | 119 | 400952 | Census Tract 4009.52 | 3014 | 2299 | Census Tract 4009.52, Madison County, Illinois | 1221 | 3014 | POLYGON ((-90.09442 38.72031, -90.09360 38.720... |
4 | 17135957500 | 17 | 135 | 957500 | Census Tract 9575 | 2869 | 1026 | Census Tract 9575, Montgomery County, Illinois | 1171 | 2869 | POLYGON ((-89.70369 39.34803, -89.69928 39.348... |
# Read in covid case data
covid_data = gpd.read_file('./data/raw/public/PopData/Chicago_ZIPCODE.shp')
covid_data['cases'] = covid_data['cases']
covid_data.head()
ZCTA5CE10 | County | State | Join | ZONE | ZONENAME | FIPS | pop | cases | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 60660 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 43242 | 78 | POLYGON ((-87.65049 41.99735, -87.65029 41.996... |
1 | 60640 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 69715 | 117 | POLYGON ((-87.64645 41.97965, -87.64565 41.978... |
2 | 60614 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 71308 | 134 | MULTIPOLYGON (((-87.67703 41.91845, -87.67705 ... |
3 | 60712 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 12539 | 42 | MULTIPOLYGON (((-87.76181 42.00465, -87.76156 ... |
4 | 60076 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 31867 | 114 | MULTIPOLYGON (((-87.74782 42.01540, -87.74526 ... |
Note that 999 is treated as a "NULL"/"NA" so these hospitals are filtered out. This data contains the number of ICU beds and ventilators at each hospital.
# Read in hospital data
hospitals = gpd.read_file('./data/raw/public/HospitalData/Chicago_Hospital_Info.shp')
hospitals.head()
FID | Hospital | City | ZIP_Code | X | Y | Total_Bed | Adult ICU | Total Vent | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | Methodist Hospital of Chicago | Chicago | 60640 | -87.671079 | 41.972800 | 145 | 36 | 12 | MULTIPOINT (-87.67108 41.97280) |
1 | 4 | Advocate Christ Medical Center | Oak Lawn | 60453 | -87.732483 | 41.720281 | 785 | 196 | 64 | MULTIPOINT (-87.73248 41.72028) |
2 | 13 | Evanston Hospital | Evanston | 60201 | -87.683288 | 42.065393 | 354 | 89 | 29 | MULTIPOINT (-87.68329 42.06539) |
3 | 24 | AMITA Health Adventist Medical Center Hinsdale | Hinsdale | 60521 | -87.920116 | 41.805613 | 261 | 65 | 21 | MULTIPOINT (-87.92012 41.80561) |
4 | 25 | Holy Cross Hospital | Chicago | 60629 | -87.690841 | 41.770001 | 264 | 66 | 21 | MULTIPOINT (-87.69084 41.77000) |
# Plot hospital data
m = folium.Map(location=[41.85, -87.65], tiles='cartodbpositron', zoom_start=10)
for i in range(0, len(hospitals)):
folium.CircleMarker(
location=[hospitals.iloc[i]['Y'], hospitals.iloc[i]['X']],
popup="{}{}\n{}{}\n{}{}".format('Hospital Name: ',hospitals.iloc[i]['Hospital'],
'ICU Beds: ',hospitals.iloc[i]['Adult ICU'],
'Ventilators: ', hospitals.iloc[i]['Total Vent']),
radius=5,
color='blue',
fill=True,
fill_opacity=0.6,
legend_name = 'Hospitals'
).add_to(m)
legend_html = '''<div style="position: fixed; width: 20%; heigh: auto;
bottom: 10px; left: 10px;
solid grey; z-index:9999; font-size:14px;
"> Legend<br>'''
m
# Read in and plot grid file for Chicago
grid_file = gpd.read_file('./data/raw/public/GridFile/Chicago_Grid.shp')
grid_file.plot(figsize=(8,8))
<AxesSubplot:>
If Chicago_Network_Buffer.graphml
does not already exist, this cell will query the road network from OpenStreetMap.
Each of the road network code blocks may take a few mintues to run.
%%time
# To create a new graph from OpenStreetMap, delete or rename data/raw/private/Chicago_Network_Buffer.graphml
# (if it exists), and set OSM to True
OSM = False
# if buffered street network is not saved, and OSM is preferred, # generate a new graph from OpenStreetMap and save it
if not os.path.exists("./data/raw/private/Chicago_Network_Buffer.graphml") and OSM:
print("Loading buffered Chicago road network from OpenStreetMap. Please wait... runtime may exceed 9min...", flush=True)
G = ox.graph_from_place('Chicago', network_type='drive', buffer_dist=24140.2)
print("Saving Chicago road network to raw/private/Chicago_Network_Buffer.graphml. Please wait...", flush=True)
ox.save_graphml(G, './data/raw/private/Chicago_Network_Buffer.graphml')
print("Data saved.")
# otherwise, if buffered street network is not saved, download graph from the OSF project
elif not os.path.exists("./data/raw/private/Chicago_Network_Buffer.graphml"):
print("Downloading buffered Chicago road network from OSF...", flush=True)
url = 'https://osf.io/download/z8ery/'
r = requests.get(url, allow_redirects=True)
print("Saving buffered Chicago road network to file...", flush=True)
open('./data/raw/private/Chicago_Network_Buffer.graphml', 'wb').write(r.content)
# if the buffered street network is already saved, load it
if os.path.exists("./data/raw/private/Chicago_Network_Buffer.graphml"):
print("Loading buffered Chicago road network from raw/private/Chicago_Network_Buffer.graphml. Please wait...", flush=True)
G = ox.load_graphml('./data/raw/private/Chicago_Network_Buffer.graphml')
print("Data loaded.")
else:
print("Error: could not load the road network from file.")
Loading buffered Chicago road network from raw/private/Chicago_Network_Buffer.graphml. Please wait... Data loaded. CPU times: user 39.2 s, sys: 1.67 s, total: 40.9 s Wall time: 41 s
%%time
ox.plot_graph(G, node_size = 1, bgcolor = 'white', node_color = 'black', edge_color = "#333333", node_alpha = 0.5, edge_linewidth = 0.5)
CPU times: user 58.5 s, sys: 373 ms, total: 58.9 s Wall time: 58.7 s
(<Figure size 576x576 with 1 Axes>, <AxesSubplot:>)
Display all the unique speed limit values and count how many network edges (road segments) have each value. We will compare this to our cleaned network later.
%%time
# Turn nodes and edges into geodataframes
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True)
# Get unique counts of road segments for each speed limit
print(edges['maxspeed'].value_counts())
print(str(len(edges)) + " edges in graph")
# can we also visualize highways / roads with higher speed limits to check accuracy?
# the code above converts the graph into an edges geodataframe, which could theoretically be filtered
# by fast road segments and mapped, e.g. in folium
25 mph 4793 30 mph 3555 35 mph 3364 40 mph 2093 45 mph 1418 20 mph 1155 55 mph 614 60 mph 279 50 mph 191 40 79 15 mph 76 70 mph 71 65 mph 54 10 mph 38 [40 mph, 45 mph] 27 [30 mph, 35 mph] 26 45,30 24 [40 mph, 35 mph] 22 70 21 25 20 [55 mph, 45 mph] 16 25, east 14 [45 mph, 35 mph] 13 [30 mph, 25 mph] 10 [45 mph, 50 mph] 8 50 8 [40 mph, 30 mph] 7 [35 mph, 25 mph] 6 [55 mph, 60 mph] 5 20 4 [70 mph, 60 mph] 3 [65 mph, 60 mph] 3 [40 mph, 45 mph, 35 mph] 3 [70 mph, 65 mph] 2 [70 mph, 45 mph, 5 mph] 2 [40, 45 mph] 2 [35 mph, 50 mph] 2 35 2 [55 mph, 65 mph] 2 [40 mph, 50 mph] 2 [15 mph, 25 mph] 2 [40 mph, 35 mph, 25 mph] 2 [15 mph, 40 mph, 30 mph] 2 [20 mph, 25 mph] 2 [30 mph, 25, east] 2 [65 mph, 55 mph] 2 [20 mph, 35 mph] 2 [55 mph, 55] 2 55 2 [15 mph, 30 mph] 2 [45 mph, 30 mph] 2 [15 mph, 45 mph] 2 [55 mph, 45, east, 50 mph] 2 [20 mph, 30 mph] 1 [5 mph, 45 mph, 35 mph] 1 [55 mph, 35 mph] 1 [5 mph, 35 mph] 1 [55 mph, 50 mph] 1 Name: maxspeed, dtype: int64 384240 edges in graph CPU times: user 36.6 s, sys: 97.9 ms, total: 36.7 s Wall time: 36.6 s
edges.head()
osmid | highway | oneway | length | name | geometry | lanes | ref | bridge | maxspeed | access | service | tunnel | junction | width | area | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
u | v | key | ||||||||||||||||
261095436 | 261095437 | 0 | 24067717 | residential | False | 46.873 | NaN | LINESTRING (-87.90237 42.10571, -87.90198 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
261095437 | 261095439 | 0 | 24067717 | residential | False | 46.317 | NaN | LINESTRING (-87.90198 42.10540, -87.90159 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
261095436 | 0 | 24067717 | residential | False | 46.873 | NaN | LINESTRING (-87.90198 42.10540, -87.90237 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
261109275 | 0 | 24069424 | residential | False | 34.892 | NaN | LINESTRING (-87.90198 42.10540, -87.90227 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
261109274 | 0 | 24069424 | residential | False | 47.866 | NaN | LINESTRING (-87.90198 42.10540, -87.90156 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Cleans the OSMNX network to work better with drive-time analysis.
First, we remove all nodes with 0 outdegree because any hospital assigned to such a node would be unreachable from everywhere. Next, we remove small (under 10 node) strongly connected components to reduce erroneously small ego-centric networks. Lastly, we ensure that the max speed is set and in the correct units before calculating time.
Args:
Returns:
# view all highway types
print(edges['highway'].value_counts())
residential 296481 secondary 30909 tertiary 29216 primary 19277 motorway_link 2322 unclassified 1840 motorway 1449 trunk 843 primary_link 833 secondary_link 356 living_street 238 trunk_link 157 tertiary_link 121 [residential, unclassified] 69 [tertiary, residential] 66 [secondary, primary] 15 [secondary, tertiary] 10 [motorway, motorway_link] 6 [tertiary, unclassified] 6 [motorway, trunk] 4 [residential, living_street] 4 [secondary, secondary_link] 3 busway 2 [motorway, primary] 2 [tertiary, motorway_link] 2 emergency_bay 2 [trunk, primary] 2 [tertiary, tertiary_link] 1 [trunk, motorway] 1 [primary, motorway_link] 1 [secondary, motorway_link] 1 [primary_link, residential] 1 Name: highway, dtype: int64
# two things about this function:
# 1) the work to remove nodes is hardly worth it now that OSMnx cleans graphs by default
# the function is now only pruning < 300 nodes
# 2) try using the OSMnx speed module for setting speeds, travel times
# https://osmnx.readthedocs.io/en/stable/user-reference.html#module-osmnx.speed
# just be careful about units of speed and time!
# the remainder of this code expects 'time' to be measured in minutes
def network_setting(network):
_nodes_removed = len([n for (n, deg) in network.out_degree() if deg ==0])
network.remove_nodes_from([n for (n, deg) in network.out_degree() if deg ==0])
for component in list(nx.strongly_connected_components(network)):
if len(component)<10:
for node in component:
_nodes_removed+=1
network.remove_node(node)
ox.speed.add_edge_speeds(network)
ox.speed.add_edge_travel_times(network)
print("Removed {} nodes ({:2.4f}%) from the OSMNX network".format(_nodes_removed, _nodes_removed/float(network.number_of_nodes())))
print("Number of nodes: {}".format(network.number_of_nodes()))
print("Number of edges: {}".format(network.number_of_edges()))
return(network)
%%time
# G, hospitals, grid_file, pop_data = file_import (population_dropdown.value)
G = network_setting(G)
# Create point geometries for each node in the graph, to make constructing catchment area polygons easier
for node, data in G.nodes(data=True):
data['geometry']=Point(data['x'], data['y'])
# Modify code to react to processor dropdown (got rid of file_import function)
Removed 274 nodes (0.0019%) from the OSMNX network Number of nodes: 142044 Number of edges: 383911 CPU times: user 45.9 s, sys: 465 ms, total: 46.4 s Wall time: 46.3 s
Display all the unique speed limit values and count how many network edges (road segments) have each value. Compare to the previous results.
%%time
## Get unique counts for each road network
# Turn nodes and edges in geodataframes
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True)
# Check that osmnx added speeds and travel times to graph
print(edges['speed_kph'].value_counts())
print(str(len(edges)) + " edges in graph")
print(edges['travel_time'].value_counts())
39.2 291413 48.3 29822 56.7 26353 60.1 14985 40.2 5604 56.3 3364 86.3 2200 64.4 2093 32.2 1872 42.9 1793 72.4 1418 69.8 654 88.5 606 90.1 565 96.6 277 80.5 191 51.0 118 40.0 80 24.1 76 112.7 61 104.6 42 16.1 38 25.0 34 68.0 29 52.0 26 45.3 24 60.0 24 70.0 21 64.0 18 80.0 16 44.0 12 56.0 9 76.0 8 50.0 8 48.0 8 36.0 6 92.0 5 96.0 4 71.0 4 20.0 4 104.0 3 32.0 3 72.0 3 45.0 3 100.0 3 52.4 2 55.0 2 108.0 2 35.0 2 53.0 2 84.0 1 Name: speed_kph, dtype: int64 383911 edges in graph 9.3 14185 9.2 11922 18.6 8012 9.4 7209 18.5 6608 ... 199.5 1 115.5 1 145.7 1 122.3 1 136.9 1 Name: travel_time, Length: 1183, dtype: int64 CPU times: user 36.1 s, sys: 143 ms, total: 36.2 s Wall time: 36.2 s
def hospital_setting(hospitals, G):
# Create an empty column
hospitals['nearest_osm']=None
# Append the neaerest osm column with each hospitals neaerest osm node
for i in tqdm(hospitals.index, desc="Find the nearest network node from hospitals", position=0):
hospitals['nearest_osm'][i] = ox.get_nearest_node(G, [hospitals['Y'][i], hospitals['X'][i]], method='euclidean') # find the nearest node from hospital location
print ('hospital setting is done')
return(hospitals)
Converts geodata to centroids
Args:
Returns:
def pop_centroid (pop_data, pop_type):
pop_data = pop_data.to_crs({'init': 'epsg:4326'})
# If pop is selected in dropdown, select at risk pop where population is greater than 0
if pop_type =="pop":
pop_data=pop_data[pop_data['OverFifty']>=0]
# If covid is selected in dropdown, select where covid cases are greater than 0
if pop_type =="covid":
pop_data=pop_data[pop_data['cases']>=0]
pop_cent = pop_data.centroid # it make the polygon to the point without any other information
# Convert to gdf
pop_centroid = gpd.GeoDataFrame()
i = 0
for point in tqdm(pop_cent, desc='Pop Centroid File Setting', position=0):
if pop_type== "pop":
pop = pop_data.iloc[i]['OverFifty']
code = pop_data.iloc[i]['GEOID']
if pop_type =="covid":
pop = pop_data.iloc[i]['cases']
code = pop_data.iloc[i].ZCTA5CE10
pop_centroid = pop_centroid.append({'code':code,'pop': pop,'geometry': point}, ignore_index=True)
i = i+1
return(pop_centroid)
Function written by Joe Holler + Derrick Burt. It is a more efficient way to calculate distance-weighted catchment areas for each hospital. The algorithm runs quicker than the original one ("calculate_catchment_area"). It first creates a dictionary (with a node and its corresponding drive time from the hospital) of all nodes within a 30 minute drive time (using single_cource_dijkstra_path_length function). From here, two more dictionaries are constructed by querying the original one. From this dictionaries, single part convex hulls are created for each drive time interval and appended into a single list (one list with 3 polygon geometries). Within the list, the polygons are differenced from each other to produce three catchment areas.
Args:
Returns:
def dijkstra_cca_polygons(G, nearest_osm, distances, distance_unit = "travel_time"):
'''
Before running: must assign point geometries to street nodes
# create point geometries for the entire graph
for node, data in G.nodes(data=True):
data['geometry']=Point(data['x'], data['y'])
'''
## CREATE DICTIONARIES
# create dictionary of nearest nodes
nearest_nodes_30 = nx.single_source_dijkstra_path_length(G, nearest_osm, distances[2], distance_unit) # creating the largest graph from which 10 and 20 minute drive times can be extracted from
# extract values within 20 and 10 (respectively) minutes drive times
nearest_nodes_20 = dict()
nearest_nodes_10 = dict()
for key, value in nearest_nodes_30.items():
if value <= distances[1]:
nearest_nodes_20[key] = value
if value <= distances[0]:
nearest_nodes_10[key] = value
## CREATE POLYGONS FOR 3 DISTANCE CATEGORIES (10 min, 20 min, 30 min)
# 30 MIN
# If the graph already has a geometry attribute with point data,
# this line will create a GeoPandas GeoDataFrame from the nearest_nodes_30 dictionary
points_30 = gpd.GeoDataFrame(gpd.GeoSeries(nx.get_node_attributes(G.subgraph(nearest_nodes_30), 'geometry')))
# This line converts the nearest_nodes_30 dictionary into a Pandas data frame and joins it to points
# left_index=True and right_index=True are options for merge() to join on the index values
points_30 = points_30.merge(pd.Series(nearest_nodes_30).to_frame(), left_index=True, right_index=True)
# Re-name the columns and set the geodataframe geometry to the geometry column
points_30 = points_30.rename(columns={'0_x':'geometry','0_y':'z'}).set_geometry('geometry')
# Create a convex hull polygon from the points
polygon_30 = gpd.GeoDataFrame(gpd.GeoSeries(points_30.unary_union.convex_hull))
polygon_30 = polygon_30.rename(columns={0:'geometry'}).set_geometry('geometry')
# 20 MIN
# Select nodes less than or equal to 20
points_20 = points_30.query("z <= 1200")
# Create a convex hull polygon from the points
polygon_20 = gpd.GeoDataFrame(gpd.GeoSeries(points_20.unary_union.convex_hull))
polygon_20 = polygon_20.rename(columns={0:'geometry'}).set_geometry('geometry')
# 10 MIN
# Select nodes less than or equal to 10
points_10 = points_30.query("z <= 600")
# Create a convex hull polygon from the points
polygon_10 = gpd.GeoDataFrame(gpd.GeoSeries(points_10.unary_union.convex_hull))
polygon_10 = polygon_10.rename(columns={0:'geometry'}).set_geometry('geometry')
# Create empty list and append polygons
polygons = []
# Append
polygons.append(polygon_10)
polygons.append(polygon_20)
polygons.append(polygon_30)
# Clip the overlapping distance ploygons (create two donuts + hole)
for i in reversed(range(1, len(distances))):
polygons[i] = gpd.overlay(polygons[i], polygons[i-1], how="difference")
return polygons
Measures the effect of a single hospital on the surrounding area. (Uses dijkstra_cca_polygons
)
Args:
Returns:
def hospital_measure_acc (_thread_id, hospital, pop_data, distances, weights):
# Create polygons
polygons = dijkstra_cca_polygons(G, hospital['nearest_osm'], distances)
# Calculate accessibility measurements
num_pops = []
for j in pop_data.index:
point = pop_data['geometry'][j]
# Multiply polygons by weights
for k in range(len(polygons)):
if len(polygons[k]) > 0: # To exclude the weirdo (convex hull is not polygon)
if (point.within(polygons[k].iloc[0]["geometry"])):
num_pops.append(pop_data['pop'][j]*weights[k])
total_pop = sum(num_pops)
for i in range(len(distances)):
polygons[i]['time']=distances[i]
polygons[i]['total_pop']=total_pop
polygons[i]['hospital_icu_beds'] = float(hospital['Adult ICU'])/polygons[i]['total_pop'] # proportion of # of beds over pops in 10 mins
polygons[i]['hospital_vents'] = float(hospital['Total Vent'])/polygons[i]['total_pop'] # proportion of # of beds over pops in 10 mins
polygons[i].crs = { 'init' : 'epsg:4326'}
polygons[i] = polygons[i].to_crs({'init':'epsg:32616'})
print('{:.0f}'.format(_thread_id), end=" ", flush=True)
return(_thread_id, [ polygon.copy(deep=True) for polygon in polygons ])
Parallel implementation of accessibility measurement.
Args:
Returns:
def hospital_acc_unpacker(args):
return hospital_measure_acc(*args)
# WHERE THE RESULTS ARE POOLED AND THEN REAGGREGATED
def measure_acc_par (hospitals, pop_data, network, distances, weights, num_proc = 4):
catchments = []
for distance in distances:
catchments.append(gpd.GeoDataFrame())
pool = mp.Pool(processes = num_proc)
hospital_list = [ hospitals.iloc[i] for i in range(len(hospitals)) ]
print("Calculating", len(hospital_list), "hospital catchments...\ncompleted number:", end=" ")
results = pool.map(hospital_acc_unpacker, zip(range(len(hospital_list)), hospital_list, itertools.repeat(pop_data), itertools.repeat(distances), itertools.repeat(weights)))
pool.close()
results.sort()
results = [ r[1] for r in results ]
for i in range(len(results)):
for j in range(len(distances)):
catchments[j] = catchments[j].append(results[i][j], sort=False)
return catchments
Calculates and aggregates accessibility statistics for one catchment on our grid file.
Args:
Returns:
from collections import Counter
def overlap_calc(_id, poly, grid_file, weight, service_type):
value_dict = Counter()
if type(poly.iloc[0][service_type])!=type(None):
value = float(poly[service_type])*weight
intersect = gpd.overlay(grid_file, poly, how='intersection')
intersect['overlapped']= intersect.area
intersect['percent'] = intersect['overlapped']/intersect['area']
intersect=intersect[intersect['percent']>=0.5]
intersect_region = intersect['id']
for intersect_id in intersect_region:
try:
value_dict[intersect_id] +=value
except:
value_dict[intersect_id] = value
return(_id, value_dict)
def overlap_calc_unpacker(args):
return overlap_calc(*args)
Calculates how all catchment areas overlap with and affect the accessibility of each grid in our grid file.
Args:
Returns:
def overlapping_function (grid_file, catchments, service_type, weights, num_proc = 4):
grid_file[service_type]=0
pool = mp.Pool(processes = num_proc)
acc_list = []
for i in range(len(catchments)):
acc_list.extend([ catchments[i][j:j+1] for j in range(len(catchments[i])) ])
acc_weights = []
for i in range(len(catchments)):
acc_weights.extend( [weights[i]]*len(catchments[i]) )
results = pool.map(overlap_calc_unpacker, zip(range(len(acc_list)), acc_list, itertools.repeat(grid_file), acc_weights, itertools.repeat(service_type)))
pool.close()
results.sort()
results = [ r[1] for r in results ]
service_values = results[0]
for result in results[1:]:
service_values+=result
for intersect_id, value in service_values.items():
grid_file.loc[grid_file['id']==intersect_id, service_type] += value
return(grid_file)
Normalizes our result (Geodataframe) for a given resource (res).
def normalization (result, res):
result[res]=(result[res]-min(result[res]))/(max(result[res])-min(result[res]))
return result
Imports all files we need to run our code and pulls the Illinois network from OSMNX if it is not present (will take a while).
NOTE: even if we calculate accessibility for just Chicago, we want to use the Illinois network (or at least we should not use the Chicago network) because using the Chicago network will result in hospitals near but outside of Chicago having an infinite distance (unreachable because roads do not extend past Chicago).
Args:
Returns:
def output_map(output_grid, base_map, hospitals, resource):
ax=output_grid.plot(column=resource, cmap='PuBuGn',figsize=(18,12), legend=True, zorder=1)
# Next two lines set bounds for our x- and y-axes because it looks like there's a weird
# Point at the bottom left of the map that's messing up our frame (Maja)
ax.set_xlim([314000, 370000])
ax.set_ylim([540000, 616000])
base_map.plot(ax=ax, facecolor="none", edgecolor='gray', lw=0.1)
hospitals.plot(ax=ax, markersize=10, zorder=1, c='blue')
Below you can customize the input of the model:
import ipywidgets
from IPython.display import display
processor_dropdown = ipywidgets.Dropdown( options=[("1", 1), ("2", 2), ("3", 3), ("4", 4)],
value = 4, description = "Processor: ")
population_dropdown = ipywidgets.Dropdown( options=[("Population at Risk", "pop"), ("COVID-19 Patients", "covid") ],
value = "pop", description = "Population: ")
resource_dropdown = ipywidgets.Dropdown( options=[("ICU Beds", "hospital_icu_beds"), ("Ventilators", "hospital_vents") ],
value = "hospital_icu_beds", description = "Resource: ")
hospital_dropdown = ipywidgets.Dropdown( options=[("All hospitals", "hospitals"), ("Subset", "hospital_subset") ],
value = "hospitals", description = "Hospital:")
display(processor_dropdown,population_dropdown,resource_dropdown,hospital_dropdown)
Dropdown(description='Processor: ', index=3, options=(('1', 1), ('2', 2), ('3', 3), ('4', 4)), value=4)
Dropdown(description='Population: ', options=(('Population at Risk', 'pop'), ('COVID-19 Patients', 'covid')), …
Dropdown(description='Resource: ', options=(('ICU Beds', 'hospital_icu_beds'), ('Ventilators', 'hospital_vents…
Dropdown(description='Hospital:', options=(('All hospitals', 'hospitals'), ('Subset', 'hospital_subset')), val…
if population_dropdown.value == "pop":
pop_data = pop_centroid(atrisk_data, population_dropdown.value)
elif population_dropdown.value == "covid":
pop_data = pop_centroid(covid_data, population_dropdown.value)
distances=[600, 1200, 1800] # Distances in travel time
weights=[1.0, 0.68, 0.22] # Weights where weights[0] is applied to distances[0]
# Other weighting options representing different distance decays
# weights1, weights2, weights3 = [1.0, 0.42, 0.09], [1.0, 0.75, 0.5], [1.0, 0.5, 0.1]
# it is surprising how long this function takes just to calculate centroids.
# why not do it with the geopandas/pandas functions rather than iterating through every item?
Pop Centroid File Setting: 100%|██████████| 3121/3121 [03:54<00:00, 13.31it/s]
If you have already run this code and changed the Hospital selection, rerun the Load Hospital Data block.
# Set hospitals according to hospital dropdown
if hospital_dropdown.value == "hospital_subset":
hospitals = hospital_setting(hospitals[:1], G)
else:
hospitals = hospital_setting(hospitals, G)
resources = ["hospital_icu_beds", "hospital_vents"] # resources
# this is also slower than it needs to be; if network nodes and hospitals are both
# geopandas data frames, it should be possible to do a much faster spatial join rather than iterating through every hospital
Find the nearest network node from hospitals: 100%|██████████| 66/66 [01:25<00:00, 1.29s/it]
hospital setting is done
# Create point geometries for entire graph
# what is the pupose of the following two lines? Can this be deleted?
# for node, data in G.nodes(data=True):
# data['geometry']=Point(data['x'], data['y'])
# which hospital to visualize?
fighosp = 7
# Create catchment for hospital 0
poly = dijkstra_cca_polygons(G, hospitals['nearest_osm'][fighosp], distances)
# Reproject polygons
for i in range(len(poly)):
poly[i].crs = { 'init' : 'epsg:4326'}
poly[i] = poly[i].to_crs({'init':'epsg:32616'})
# Reproject hospitals
# Possible to map from the hospitals data rather than creating hospital_subset?
hospital_subset = hospitals.iloc[[fighosp]].to_crs(epsg=32616)
fig, ax = plt.subplots(figsize=(12,8))
min_10 = poly[0].plot(ax=ax, color="royalblue", label="10 min drive")
min_20 = poly[1].plot(ax=ax, color="cornflowerblue", label="20 min drive")
min_30 = poly[2].plot(ax=ax, color="lightsteelblue", label="30 min drive")
hospital_subset.plot(ax=ax, color="red", legend=True, label = "hospital")
# Add legend
ax.legend()
<matplotlib.legend.Legend at 0x7f722bf49340>
poly
[ geometry 0 POLYGON ((443456.283 4609874.589, 441585.172 4..., geometry 0 POLYGON ((433443.581 4600237.316, 427780.923 4..., geometry 0 POLYGON ((438932.445 4588484.312, 431706.358 4...]
%%time
catchments = measure_acc_par(hospitals, pop_data, G, distances, weights, num_proc=processor_dropdown.value)
Calculating 66 hospital catchments... completed number: 5 15 0 10 6 1 16 11 2 7 17 12 3 8 18 13 4 9 19 14 20 25 30 35 21 31 26 36 22 27 32 37 28 23 33 38 29 24 34 39 40 45 50 55 41 46 51 56 42 47 52 57 43 48 58 53 44 49 59 54 60 65 61 62 63 64 CPU times: user 2.23 s, sys: 535 ms, total: 2.76 s Wall time: 2min 12s
%%time
for j in range(len(catchments)):
catchments[j] = catchments[j][catchments[j][resource_dropdown.value]!=float('inf')]
result=overlapping_function(grid_file, catchments, resource_dropdown.value, weights, num_proc=processor_dropdown.value)
CPU times: user 6.42 s, sys: 460 ms, total: 6.88 s Wall time: 17.9 s
Instead of aggregating the accesibility results by the 50% threshold, multiply by the weights for their catchment area and reaggregate by hexagon id
# add weight field to each catchment polygon
for i in range(len(weights)):
catchments[i]['weight'] = weights[i]
# combine the three sets of catchment polygons into one geodataframe
geocatchments = pd.concat([catchments[0], catchments[1], catchments[2]])
geocatchments
geometry | time | total_pop | hospital_icu_beds | hospital_vents | weight | |
---|---|---|---|---|---|---|
0 | POLYGON ((446359.955 4637144.048, 444654.345 4... | 600 | 789023.74 | 0.000046 | 0.000015 | 1.00 |
0 | POLYGON ((438353.601 4609853.779, 432065.727 4... | 600 | 717916.38 | 0.000273 | 0.000089 | 1.00 |
0 | POLYGON ((442878.135 4648745.067, 441056.875 4... | 600 | 469346.52 | 0.000190 | 0.000062 | 1.00 |
0 | POLYGON ((423900.989 4621140.151, 421031.920 4... | 600 | 732753.60 | 0.000089 | 0.000029 | 1.00 |
0 | POLYGON ((443322.063 4615428.578, 438387.446 4... | 600 | 716375.12 | 0.000092 | 0.000029 | 1.00 |
... | ... | ... | ... | ... | ... | ... |
0 | POLYGON ((440431.675 4605335.203, 415910.447 4... | 1800 | 1015836.64 | 0.000027 | 0.000009 | 0.22 |
0 | MULTIPOLYGON (((418680.569 4620247.323, 411754... | 1800 | 754080.60 | 0.000060 | 0.000019 | 0.22 |
0 | POLYGON ((421589.871 4617483.974, 415910.447 4... | 1800 | 973822.96 | 0.000086 | 0.000028 | 0.22 |
0 | POLYGON ((438852.888 4603453.515, 415910.447 4... | 1800 | 936876.30 | 0.000065 | 0.000021 | 0.22 |
0 | POLYGON ((416049.771 4606193.128, 411127.822 4... | 1800 | 815589.78 | 0.000115 | 0.000037 | 0.22 |
198 rows × 6 columns
%%time
# set weighted to False for original 50% threshold method
# switch to True for area-weighted overlay
weighted = True
# if the value to be calculated is already in the hegaxon grid, delete it
# otherwise, the field name gets a suffix _1 in the overlay step
if resource_dropdown.value in list(grid_file.columns.values):
grid_file = grid_file.drop(resource_dropdown.value, axis = 1)
# calculate hexagon 'target' areas
grid_file['area'] = grid_file.area
# Intersection overlay of hospital catchments and hexagon grid
print("Intersecting hospital catchments with hexagon grid...")
fragments = gpd.overlay(grid_file, geocatchments, how='intersection')
# Calculate percent coverage of the hexagon by the hospital catchment as
# fragment area / target(hexagon) area
fragments['percent'] = fragments.area / fragments['area']
# if using weighted aggregation...
if weighted:
print("Calculating area-weighted value...")
# multiply the service/population ratio by the distance weight and the percent coverage
fragments['value'] = fragments[resource_dropdown.value] * fragments['weight'] * fragments['percent']
# if using the 50% coverage rule for unweighted aggregation...
else:
print("Calculating value for hexagons with >=50% overlap...")
# filter for only the fragments with > 50% coverage by hospital catchment
fragments = fragments[fragments['percent']>=0.5]
# multiply the service/population ration by the distance weight
fragments['value'] = fragments[resource_dropdown.value] * fragments['weight']
# select just the hexagon id and value from the fragments,
# group the fragments by the (hexagon) id,
# and sum the values
print("Summarizing results by hexagon id...")
sum_results = fragments[['id', 'value']].groupby(by = ['id']).sum()
# join the results to the hexagon grid_file based on hexagon id
print("Joining results to hexagons...")
result_new = pd.merge(grid_file, sum_results, how="left", on = "id")
# rename value column name to the resource name
result_new.rename(columns = {'value' : resource_dropdown.value})
Intersecting hospital catchments with hexagon grid... Calculating area-weighted value... Summarizing results by hexagon id... Joining results to hexagons... CPU times: user 12.8 s, sys: 24.2 ms, total: 12.8 s Wall time: 12.8 s
left | top | right | bottom | id | area | geometry | hospital_icu_beds | |
---|---|---|---|---|---|---|---|---|
0 | 440843.416087 | 4.638515e+06 | 441420.766356 | 4.638015e+06 | 4158 | 216506.350946 | POLYGON ((440843.416 4638265.403, 440987.754 4... | 0.003564 |
1 | 440843.416087 | 4.638015e+06 | 441420.766356 | 4.637515e+06 | 4159 | 216506.350946 | POLYGON ((440843.416 4637765.403, 440987.754 4... | 0.003617 |
2 | 440843.416087 | 4.639515e+06 | 441420.766356 | 4.639015e+06 | 4156 | 216506.350946 | POLYGON ((440843.416 4639265.403, 440987.754 4... | 0.003626 |
3 | 440843.416087 | 4.639015e+06 | 441420.766356 | 4.638515e+06 | 4157 | 216506.350946 | POLYGON ((440843.416 4638765.403, 440987.754 4... | 0.003567 |
4 | 440843.416087 | 4.640515e+06 | 441420.766356 | 4.640015e+06 | 4154 | 216506.350946 | POLYGON ((440843.416 4640265.403, 440987.754 4... | 0.003676 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
3274 | 440843.416087 | 4.643015e+06 | 441420.766356 | 4.642515e+06 | 4149 | 216506.350946 | POLYGON ((440843.416 4642765.403, 440987.754 4... | 0.003583 |
3275 | 440843.416087 | 4.644515e+06 | 441420.766356 | 4.644015e+06 | 4146 | 216506.350946 | POLYGON ((440843.416 4644265.403, 440987.754 4... | 0.003473 |
3276 | 440843.416087 | 4.644015e+06 | 441420.766356 | 4.643515e+06 | 4147 | 216506.350946 | POLYGON ((440843.416 4643765.403, 440987.754 4... | 0.003499 |
3277 | 440843.416087 | 4.645515e+06 | 441420.766356 | 4.645015e+06 | 4144 | 216506.350946 | POLYGON ((440843.416 4645265.403, 440987.754 4... | 0.003427 |
3278 | 440843.416087 | 4.645015e+06 | 441420.766356 | 4.644515e+06 | 4145 | 216506.350946 | POLYGON ((440843.416 4644765.403, 440987.754 4... | 0.003445 |
3279 rows × 8 columns
%%time
result = normalization (result, resource_dropdown.value)
CPU times: user 2.59 ms, sys: 0 ns, total: 2.59 ms Wall time: 2.38 ms
result.head()
left | top | right | bottom | id | area | geometry | hospital_icu_beds | |
---|---|---|---|---|---|---|---|---|
0 | 440843.416087 | 4.638515e+06 | 441420.766356 | 4.638015e+06 | 4158 | 216661.173 | POLYGON ((351469.371 580527.566, 351609.858 58... | 0.903274 |
1 | 440843.416087 | 4.638015e+06 | 441420.766356 | 4.637515e+06 | 4159 | 216661.168 | POLYGON ((351477.143 580027.445, 351617.630 58... | 0.929614 |
2 | 440843.416087 | 4.639515e+06 | 441420.766356 | 4.639015e+06 | 4156 | 216661.169 | POLYGON ((351453.825 581527.810, 351594.311 58... | 0.929785 |
3 | 440843.416087 | 4.639015e+06 | 441420.766356 | 4.638515e+06 | 4157 | 216661.171 | POLYGON ((351461.598 581027.688, 351602.085 58... | 0.911208 |
4 | 440843.416087 | 4.640515e+06 | 441420.766356 | 4.640015e+06 | 4154 | 216661.171 | POLYGON ((351438.276 582528.054, 351578.761 58... | 0.953645 |
Plot accessibility scores and hospital points over hexagon grid.
%%time
hospitals = hospitals.to_crs({'init': 'epsg:26971'})
result = result.to_crs({'init': 'epsg:26971'})
output_map(result, pop_data, hospitals, resource_dropdown.value)
CPU times: user 1.77 s, sys: 129 ms, total: 1.9 s Wall time: 1.7 s
Plot accessibility scores with equal breaks classification and hospital points over hexagon grid.
def output_map_classified(output_grid, hospitals, resource):
ax=output_grid.plot(column=resource,
scheme='Equal_Interval',
k=5,
linewidth=0,
cmap='Blues',
figsize=(18,12),
legend=True,
label="Acc Measure",
zorder=1)
# Next two lines set bounds for our x- and y-axes because it looks like there's a weird
# Point at the bottom left of the map that's messing up our frame (Maja)
ax.set_xlim([325000, 370000])
ax.set_ylim([550000, 600000])
hospitals.plot(ax=ax,
markersize=10,
zorder=2,
c='black',
legend=True,
label="Hospital"
)
# ax.legend(loc="upper right") # add hospital legend
output_map_classified(result, hospitals, resource_dropdown.value)
# save as image with file name including the resource value, population value, and buffered / not buffered
plt.savefig('./results/figures/reproduction/{}_{}_buff_classified_spdLimit.png'.format(population_dropdown.value, resource_dropdown.value))
Load results provided with the original research compendium from shapefile data/derived/public/Chicago_ACC.shp
. The file contains fields hospital_i
for "ICU Beds" and hospital_v
for "Ventilators". It is not known exactly which version of code was used to create this set of results, although it appears that a more complete road network was used (probably the full state of Illinois) than was provided with the compendium (Chicago only, without a buffer). It is not known whether the population was population at risk (>50 years old) or COVID patients (cases). However, the statistical and geographic distributions of each are extremly similar once the data has been normalized.
# Import study results to compare
# hospital_i assumed to be for ICU and hospital_v assumed to be for ventilator
# however it's unknown whether the population is the COVID-19 population or the AT RISK population
fp = 'data/derived/public/Chicago_ACC.shp'
og_result = gpd.read_file(fp)
og_result.set_index("id")
og_result.head()
id | hospital_i | hospital_v | geometry | |
---|---|---|---|---|
0 | 4158 | 0.844249 | 0.843439 | POLYGON ((-87.71312 41.89411, -87.71140 41.896... |
1 | 4159 | 0.843600 | 0.843031 | POLYGON ((-87.71307 41.88961, -87.71135 41.891... |
2 | 4156 | 0.906094 | 0.904699 | POLYGON ((-87.71322 41.90312, -87.71150 41.905... |
3 | 4157 | 0.877197 | 0.876503 | POLYGON ((-87.71317 41.89861, -87.71145 41.900... |
4 | 4154 | 0.911424 | 0.910002 | POLYGON ((-87.71332 41.91212, -87.71160 41.914... |
Here, we join the current results from the reanalysis to the results of the original so that they have a consistent ordering so that the Spearmans Rho can be calculated.
result.set_index("id")
result_compare = result.join(og_result[["hospital_i","hospital_v"]])
result_compare.head()
left | top | right | bottom | id | area | geometry | hospital_icu_beds | hospital_i | hospital_v | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 440843.416087 | 4.638515e+06 | 441420.766356 | 4.638015e+06 | 4158 | 216661.173 | POLYGON ((351469.371 580527.566, 351609.858 58... | 0.903274 | 0.844249 | 0.843439 |
1 | 440843.416087 | 4.638015e+06 | 441420.766356 | 4.637515e+06 | 4159 | 216661.168 | POLYGON ((351477.143 580027.445, 351617.630 58... | 0.929614 | 0.843600 | 0.843031 |
2 | 440843.416087 | 4.639515e+06 | 441420.766356 | 4.639015e+06 | 4156 | 216661.169 | POLYGON ((351453.825 581527.810, 351594.311 58... | 0.929785 | 0.906094 | 0.904699 |
3 | 440843.416087 | 4.639015e+06 | 441420.766356 | 4.638515e+06 | 4157 | 216661.171 | POLYGON ((351461.598 581027.688, 351602.085 58... | 0.911208 | 0.877197 | 0.876503 |
4 | 440843.416087 | 4.640515e+06 | 441420.766356 | 4.640015e+06 | 4154 | 216661.171 | POLYGON ((351438.276 582528.054, 351578.761 58... | 0.953645 | 0.911424 | 0.910002 |
With the new implementations from this reanalysis the correlation is calculated between the results from this reanalysis and the original to see how much the implementations have infuenced the relationship between the two.
# set original_resource variable to the name of original results column matching the modeling choice
from scipy import stats
if resource_dropdown.value == "icu_beds":
original_resource = "hospital_i"
else:
original_resource = "hospital_v"
# calculate spearman's rho
rho = stats.spearmanr(result_compare[[original_resource, resource_dropdown.value]])
print(correlationmsg)
rho_result = "Rho = " + str(round(rho.correlation,3)) + ", pvalue = " + str(round(rho.pvalue,3))
print(rho_result)
Comparing: Original resource: hospital_icu_beds and population: unknown Reproduction resource: hospital_icu_beds and population: pop Rho = 0.747, pvalue = 0.0
The scatterplot helps visualize the correlation.
plt.scatter(result_compare[[original_resource]], result_compare[[resource_dropdown.value]], s=1)
plt.xlabel("Original", labelpad=5)
plt.ylabel("Reproduction", labelpad=5)
plt.text(.45, .08, rho_result, fontsize=8)
# save plot as image with file name including icu/ventilators and buffered or not
plt.savefig("./results/figures/reproduction/compare_{}_buffer_{}.png")
The changes made in this reanalysis have influenced the results in important ways.
Looking at the Classified Accessibility Map with the default settings of (buffered, pop50 and icu beds, and all hospitals), when compared to its earlier rendition in our initial reproduction study, the zone of highest accessibility has expanded to show that more healthcare resources can be accessed by more people than was assumed to be possible in the reproduction. It extends outwards in all directions but it especially extends further into the Northwest and Southwest regions of the city. Changing the settings to ventilators did not influence the map significantly, however, when "pop50" was changed to "covid patients" the map did not change much from the original reproduction.
Overall, with the default settings, the Spearman's Rho calculation of the results from this reanalysis to the original Kang et al. (2020) study shows a correlation coefficient of 0.747 compared to 0.532 from our initial reproduction. This indicates that the reproduction overall was significantly more successful at accurately reproducing the original results. Since expanding the road network was the only implementation intended to improve the reproducibility, this better correlation to the original study should not be attributed to the additional reanalysis implementations of OSMNX road-type speed limits and area weighted reaggregation. This is a great increase toward positive 1 which represents a perfect positive correlation or complete identicality to the original study.
Expanding the road network of Chicago to include a 15 mile buffer outside the city led to a road network map with a larger extent and drastically increased the number of road edges to 384,240 from 75,895 in the original reproduction. The expansion of the road network is likely to have influenced these larger zones of accessibility by accounting for people living outside the city who might access hospitals inside.
Improving the road network speed dataset also played a role in the larger accessibility zones; however, the relationship was not as direct. Initially, in the first reproduction, missing speed limit values were replaced with a generalization of 35mph. This reanalysis however, replaced these generalizations with more accurate speed limit values based on the actual road type. The road speed limit data set from the original reproduction had the most segments with a speed limit of 30 mph while it was 25 mph in this reanalysis with the newly replaced OSMNX speed limits by road type and larger network (Refer to the Check Speed Limits sections). What this means is that there was an overall estimation of how fast one could access hospital resources in the original reproduction which seems to contradict the the results of larger accessibility zones. In addition to there being more segments with speed limits of 25 mph, however, this reanalysis also had more segments with higher speed limits such as those in the 60 - 70 mph range whereas the original version included no segments of with a 70 mph speed limit. Due to more segments with speed limits both on the lower and the higher side of the original, more so than directly influencing the size of the accessibility zones in a certain direction, the improved speed limit data resulted in enhanced content validity simply improving the accuracy of the results.
The map not changing upon switching the settings from hospital beds to ventilators validates the original assumption that hospital beds and ventilators are highly correlated in their influence to accessibility. The lack of change from the original map when the settings were changed from "pop50" to "covid patients" indicates that none of our changes to this study influenced accessibility for covid patients. A possible explanation for this result could be attributed to the expansion of the road network. Seeing how it showed higher accessibility for populations at risk (over 50) but not covid patients, further internal validity tests could examine the relationships between the expansion of the road network and how it impacted covid patient data.
The larger accessibility zones could also be attributed to our implementation of a new Area Weighted Aggregation (AWR) to replace the initial 50% threshold from the original study and our initial reproduction. The initial method of aggregating the data into the hexagonal grid by checking to see if 50% or more of the hospital catchment area was intersecting with the hexagon likely generalized the results by an underrepresentation of data in the hexagons. By weighting all intersecting slices of the catchment areas in the hexagons, not only did we improve on that area of uncertainty but increasing the representation of data in the hexagonal grid could have led to the final results of increased accessibility.
Ultimately, this reanalysis of our original reproduction of Kang et al. 2020 was successful both in improving the reproducibility of the study and also improving the study itself by implementing changes that tackle critical areas of uncertainty. While these implementations made strides in this body of scientific knowledge there remains threats to validity that could be expanded on in future work.
The first most tangible step is approaching the threat of partition distortion. Partition distortion describes how decreasing partitions in a dataset inversely increases distortion due to increasing complexities in how the data is aggregated. This applies here with how the study area is represented by a hexagonal grid with the partitions being each edge of each hexagon. The size of the hexagons representing the number of hexagons and partitions, seems to have been arbitrarily chosen. Further validity tests could be beneficial to determine whether decresing the size of the hexagons (increasing partitions) would decrease the distortion of the results producing a more accurate representation of spatial accessibility to healthcare resources.
To expand the study even further, an additional approach is to understand more about the limitations of only using a transportation network based on the use of personal vehicles. Assuming that the accessibility to healthcare resources of everyone in the city of Chicago is determined by the use of personal vehicles is a gross generalization that ignores vast populations that might rely on public transportation, walking, or biking to access their essential needs. Implementing a multi-modal transportation network would be an incredible step toward improving the power of this study especially in the ways that it would highlight accessibility more accurately based on modes of transportation that typical underserved low-income populations rely on that do not involve a personal vehicle.
Lastly, the time-space threat to validity of traffic congestion could be addressed to investigate how different times of day influence the capability to access healthcare resources by implementing traffic data to get a more accurate estimate of how fast cars are able to travel along certain busier road segments.
Overall, this study has made great steps in the develop of Kang et al. 2020 by improving reproducibility and addressing areas of uncertainty and threats to validity, yet there remains great opportunity for future improvement.
Luo, W., & Qi, Y. (2009). An enhanced two-step floating catchment area (E2SFCA) method for measuring spatial accessibility to primary care physicians. Health & place, 15(4), 1100-1107.