Exploratory analysis (v0.12)¶

This notebook presents an exploratory analysis of the embeddings generated for the city of Leicester by our model v0.12 trained on a randomly sampled 1% of the nodes from 138 cities in the UK. See gnnuf_train_model_v0_12.py for further detail on the model training.

Table of contents:

  • Setup
  • Load data
    • Leicester OSMnx graph data
    • Leicester embeddings
  • Preliminary embeddings plots
    • Node embeddings Angle and distance plot
    • Pooled embeddings
  • Exploring embedding patterns
    • Clusters (node embeddings)
      • Interactive clusters map
    • Bivariate quantiles (pooled embeddings)
      • Interactive bivariate map
    • Pooled embeddings bivariate quantiles
  • Classic measures
    • Correlations with node and ego-graph stats
    • City-wide centrality
    • Ego-graph centrality
    • Street length and count

Setup ¶

In [1]:
# Base libraries
import math
import numpy as np
import pandas as pd
import geopandas as gpd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
# NetworkX
import networkx as nx
import osmnx as ox
# OS environment setup
from local_directories import *
In [2]:
# Reset random seeds
random_seed = 2674
# Other
neighbourhood_min_nodes = 8
max_distance = 500

Load data ¶

Leicester OSMnx graph data ¶

We used the data made available by Boeing, which include simplified street networks of 138 cities in the UK derived from OpenStreetMap.

In [3]:
# Load Leciester's graph
leicester_osmnx_graph = ox.io.load_graphml(bulk_storage_directory + "/osmnx/raw/leicester-1864.graphml")
leicester_osmnx_graph_prj = ox.project_graph(leicester_osmnx_graph)
In [4]:
len(list(leicester_osmnx_graph.nodes))
Out[4]:
13293
In [5]:
ox.plot_graph(
    leicester_osmnx_graph_prj,
    node_size=5, node_color="#000000",
    edge_color="#000000", edge_linewidth=0.1,
    bgcolor="#ffffff",
    figsize=(16, 16))
No description has been provided for this image
Out[5]:
(<Figure size 1600x1600 with 1 Axes>, <Axes: >)

The code below extracts the tabular data from the OSMnx format into a dataframe.

In [6]:
# Convert graph to dataframe version
leicester_osmnx_graph_prj_df = None
for node in leicester_osmnx_graph_prj:
    node_dict = leicester_osmnx_graph_prj.nodes[node]
    node_dict["osmnx_node_id"] = int(node)
    # node_dict["osmnx_node_id"] = str(node)
    if leicester_osmnx_graph_prj_df is None:
        leicester_osmnx_graph_prj_df = pd.DataFrame.from_dict([node_dict])
    else:
        leicester_osmnx_graph_prj_df = pd.concat([leicester_osmnx_graph_prj_df, pd.DataFrame.from_dict([node_dict])])
leicester_osmnx_graph_prj_df.head()
Out[6]:
y x street_count elevation elevation_aster elevation_srtm lon lat osmnx_node_id ref highway
0 5.829804e+06 622151.977595 3 72.0 35 72 -1.196195 52.604506 194739 NaN NaN
0 5.829991e+06 622098.041002 3 72.0 45 72 -1.196922 52.606196 1551014281 NaN NaN
0 5.828827e+06 622259.813792 2 79.0 57 79 -1.194965 52.595696 326312 21 motorway_junction
0 5.830107e+06 622077.742140 3 79.0 43 79 -1.197179 52.607245 326320 21 motorway_junction
0 5.829673e+06 622220.645785 3 74.0 35 74 -1.195230 52.603314 2627867454 NaN NaN

Leicester embeddings ¶

Load the pre-computed embeddings for Leicecster. See gnnuf_embedding_model_v0_12_Leicester.py for further details.

In [7]:
# Load Leciester's embeddings
leicester_emb_df = pd.read_csv(this_repo_directory + "/data/leicester-1864_emb_gnnuf_model_v0-12.csv")
leicester_emb_df.head()
Out[7]:
osmnx_node_id EMB000 EMB001
0 337976 0.700673 -0.058294
1 337979 1.052401 -0.071909
2 337983 1.176129 -0.014825
3 337985 1.200868 0.031910
4 337986 0.967397 0.003360
In [8]:
# Load Leciester's pooled embeddings
leicester_emb_pool_df = pd.read_csv(this_repo_directory + "/data/leicester-1864_emb-pool_gnnuf_model_v0-12.csv")
leicester_emb_pool_df.head()
Out[8]:
osmnx_node_id EMB000 EMB001
0 337976 0.929014 -0.045372
1 337979 0.911989 -0.045298
2 337983 0.929369 -0.041823
3 337985 0.930489 -0.040748
4 337986 0.929369 -0.041823

Preliminary embeddings plots ¶

Node embeddings ¶

Let's start with a simple scatterplot showing the embeddings obtained for street junctions in Leicester, encoding the first embedding on the x-axis and the second embedding on the y-axis.

In [9]:
def bounded_min_max(x, min_val, max_val):
    if x < min_val:
        return 0
    elif x > max_val:
        return 1
    else:
        return (x - min_val) / (max_val - min_val)

leicester_emb_df["EMB_dist"] = leicester_emb_df.apply( lambda x: 
    bounded_min_max(math.sqrt(x["EMB000"]**2 + x["EMB001"]**2), 0.75, 1.5),
    axis=1)
leicester_emb_df["EMB_angl"] = leicester_emb_df.apply( lambda x: 
    math.sin(math.atan2(x["EMB001"], x["EMB000"])),
    axis=1)
In [10]:
def embeddings_colour(emb000, emb001):
    dist = math.sqrt(emb000**2 + emb001**2)
    angl = math.sin(math.atan2(emb001, emb000))
    if dist < 0.7:
        return "#000000"
    elif dist < 1.35:
        if angl < -0.8660:
            return "#fde725"
        elif angl < -0.5:
            return "#addc30"
        elif angl < 0.0:
            return "#5ec962"
        elif angl < 0.5:
            return "#28ae80"
        elif angl < 0.8660:
            return "#21918c"
        else:
            return "#2c728e"
    else:
        if angl < -0.8660:
            return "#f9cb35"
        elif angl < -0.5:
            return "#f98e09"
        elif angl < 0.0:
            return "#e45a31"
        elif angl < 0.5:
            return "#bc3754"
        elif angl < 0.8660:
            return "#8a226a"
        else:
            return "#57106e"

leicester_emb_df["EMB_colr"] = leicester_emb_df.apply( lambda x: 
    embeddings_colour(x["EMB000"], x["EMB001"]),
    axis=1)
In [11]:
for node in leicester_osmnx_graph_prj.nodes:
    if len(leicester_emb_df[leicester_emb_df["osmnx_node_id"] == node]["EMB000"].values) == 0:
        leicester_osmnx_graph_prj.nodes[node]["EMB000"] = None
        leicester_osmnx_graph_prj.nodes[node]["EMB001"] = None
        leicester_osmnx_graph_prj.nodes[node]["EMB_dist"] = None
        leicester_osmnx_graph_prj.nodes[node]["EMB_angl"] = None
        leicester_osmnx_graph_prj.nodes[node]["EMB_colr"] = "#cccccc"
    else:
        leicester_osmnx_graph_prj.nodes[node]["EMB000"] = float(leicester_emb_df[leicester_emb_df["osmnx_node_id"] == node]["EMB000"].values[0])
        leicester_osmnx_graph_prj.nodes[node]["EMB001"] = float(leicester_emb_df[leicester_emb_df["osmnx_node_id"] == node]["EMB001"].values[0])
        leicester_osmnx_graph_prj.nodes[node]["EMB_dist"] = float(leicester_emb_df[leicester_emb_df["osmnx_node_id"] == node]["EMB_dist"].values[0])
        leicester_osmnx_graph_prj.nodes[node]["EMB_angl"] = float(leicester_emb_df[leicester_emb_df["osmnx_node_id"] == node]["EMB_angl"].values[0])
        leicester_osmnx_graph_prj.nodes[node]["EMB_colr"] = leicester_emb_df[leicester_emb_df["osmnx_node_id"] == node]["EMB_colr"].values[0]
In [12]:
ox.plot_graph(leicester_osmnx_graph_prj, node_color=[
    leicester_osmnx_graph_prj.nodes[node]["EMB000"] for node in leicester_osmnx_graph_prj.nodes],
    node_size=10, bgcolor="#ffffff",
    figsize=(16, 16))
No description has been provided for this image
Out[12]:
(<Figure size 1600x1600 with 1 Axes>, <Axes: >)
In [13]:
ox.plot_graph(leicester_osmnx_graph_prj, node_color=[
    leicester_osmnx_graph_prj.nodes[node]["EMB001"] for node in leicester_osmnx_graph_prj.nodes],
    node_size=10, bgcolor="#ffffff",
    figsize=(16, 16))
No description has been provided for this image
Out[13]:
(<Figure size 1600x1600 with 1 Axes>, <Axes: >)

We can then explore the values in more detail looking at the nodes position compared to the origin as their distance and angle.

In [14]:
plt.figure(figsize=(7,7))
ax = plt.axes()
ax.set_facecolor("white")
plt.scatter(
    x=leicester_emb_df.EMB000,
    y=leicester_emb_df.EMB001,
    c=leicester_emb_df.EMB_dist,
    s=10, edgecolors='black', linewidth=0.1)
plt.xlabel("Pooled embeddings first dimension")
plt.ylabel("Pooled embeddings second dimension")
plt.show()
No description has been provided for this image
In [15]:
fig = px.scatter(
    leicester_emb_df,
    x="EMB000",
    y="EMB001",
    color="EMB_dist",
    hover_data=['osmnx_node_id'],
    width=800, height=800,
    color_continuous_scale='viridis'
)
fig.update_layout(
    {"plot_bgcolor": "#ffffff"},
    xaxis=dict(scaleanchor="y", scaleratio=1)
)
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='#cccccc', zeroline=True, zerolinewidth=1, zerolinecolor='#cccccc')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='#cccccc', zeroline=True, zerolinewidth=1, zerolinecolor='#cccccc')
fig.show()
In [16]:
ox.plot_graph(leicester_osmnx_graph_prj, node_color=[
    leicester_osmnx_graph_prj.nodes[node]["EMB_dist"] for node in leicester_osmnx_graph_prj.nodes],
    node_size=10, bgcolor="#ffffff",
    figsize=(16, 16))
No description has been provided for this image
Out[16]:
(<Figure size 1600x1600 with 1 Axes>, <Axes: >)
In [17]:
plt.figure(figsize=(7,7))
ax = plt.axes()
ax.set_facecolor("white")
plt.scatter(
    x=leicester_emb_df.EMB000,
    y=leicester_emb_df.EMB001,
    c=leicester_emb_df.EMB_angl,
    s=10, edgecolors='black', linewidth=0.1)
plt.xlabel("Pooled embeddings first dimension")
plt.ylabel("Pooled embeddings second dimension")
plt.show()
No description has been provided for this image
In [18]:
fig = px.scatter(
    leicester_emb_df,
    x="EMB000",
    y="EMB001",
    color="EMB_angl",
    hover_data=['osmnx_node_id'],
    width=800, height=800,
    color_continuous_scale='viridis'
)
fig.update_layout(
    {"plot_bgcolor": "#ffffff"},
    xaxis=dict(scaleanchor="y", scaleratio=1)
)
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='#cccccc', zeroline=True, zerolinewidth=1, zerolinecolor='#cccccc')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='#cccccc', zeroline=True, zerolinewidth=1, zerolinecolor='#cccccc')
fig.show()
In [19]:
ox.plot_graph(leicester_osmnx_graph_prj, node_color=[
    leicester_osmnx_graph_prj.nodes[node]["EMB_angl"] for node in leicester_osmnx_graph_prj.nodes],
    node_size=10, bgcolor="#ffffff",
    figsize=(16, 16))
No description has been provided for this image
Out[19]:
(<Figure size 1600x1600 with 1 Axes>, <Axes: >)

Combined angle and distance plot ¶

In [20]:
plt.figure(figsize=(7,7))
ax = plt.axes()
ax.set_facecolor("white")
plt.scatter(
    x=leicester_emb_df.EMB000,
    y=leicester_emb_df.EMB001,
    c=leicester_emb_df.EMB_colr,
    s=10, edgecolors='black', linewidth=0.1)
plt.xlabel("Pooled embeddings first dimension")
plt.ylabel("Pooled embeddings second dimension")
plt.show()
No description has been provided for this image
In [21]:
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=leicester_emb_df.EMB000,
    y=leicester_emb_df.EMB001,
    mode='markers',
    marker=dict(color=leicester_emb_df.EMB_colr)
))
fig.update_layout({"plot_bgcolor": "#ffffff"}, width=800, height=800)
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='#cccccc', zeroline=True, zerolinewidth=1, zerolinecolor='#cccccc')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='#cccccc', zeroline=True, zerolinewidth=1, zerolinecolor='#cccccc')
fig.show()
In [22]:
ox.plot_graph(leicester_osmnx_graph_prj, node_color=[
    leicester_osmnx_graph_prj.nodes[node]["EMB_colr"] for node in leicester_osmnx_graph_prj.nodes],
    node_size=10, bgcolor="#ffffff",
    figsize=(16, 16))
No description has been provided for this image
Out[22]:
(<Figure size 1600x1600 with 1 Axes>, <Axes: >)

Pooled embeddings ¶

In [23]:
fig = px.scatter(
    leicester_emb_pool_df,
    x="EMB000",
    y="EMB001",
    hover_data=['osmnx_node_id'],
    width=800, height=800
)
fig.update_layout({"plot_bgcolor": "#ffffff"})
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='#cccccc', zeroline=True, zerolinewidth=1, zerolinecolor='#cccccc')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='#cccccc', zeroline=True, zerolinewidth=1, zerolinecolor='#cccccc')
fig.show()
In [24]:
for node in leicester_osmnx_graph_prj.nodes:
    if len(leicester_emb_pool_df[leicester_emb_pool_df["osmnx_node_id"] == node]["EMB000"].values) == 0:
        leicester_osmnx_graph_prj.nodes[node]["EMB000pool"] = None
        leicester_osmnx_graph_prj.nodes[node]["EMB001pool"] = None
    else:
        leicester_osmnx_graph_prj.nodes[node]["EMB000pool"] = float(leicester_emb_pool_df[leicester_emb_pool_df["osmnx_node_id"] == node]["EMB000"].values[0])
        leicester_osmnx_graph_prj.nodes[node]["EMB001pool"] = float(leicester_emb_pool_df[leicester_emb_pool_df["osmnx_node_id"] == node]["EMB001"].values[0])
In [25]:
ox.plot_graph(leicester_osmnx_graph_prj, node_color=[
    leicester_osmnx_graph_prj.nodes[node]["EMB000pool"] for node in leicester_osmnx_graph_prj.nodes],
    node_size=10, bgcolor="#ffffff",
    figsize=(16, 16))
No description has been provided for this image
Out[25]:
(<Figure size 1600x1600 with 1 Axes>, <Axes: >)
In [26]:
ox.plot_graph(leicester_osmnx_graph_prj, node_color=[
    leicester_osmnx_graph_prj.nodes[node]["EMB001pool"] for node in leicester_osmnx_graph_prj.nodes],
    node_size=10, bgcolor="#ffffff",
    figsize=(16, 16))
No description has been provided for this image
Out[26]:
(<Figure size 1600x1600 with 1 Axes>, <Axes: >)

Exploring embedding patterns ¶

In this section, we further explore the patterns in the embedding values and thier spatial distribution.

Clusters (node embeddings) ¶

We then illustrate eight clusters of embeddings obtained using DBSCAN and how the related nodes are spatially distributed.

In [27]:
leicester_emb_patters_df = leicester_emb_df.merge(
    # Ego-graph pooled embeddings
        leicester_emb_pool_df.rename(columns={"EMB000":"EMB000pooled", "EMB001":"EMB001pooled"}),
        on="osmnx_node_id"
    )

leicester_osmnx_patters = leicester_osmnx_graph_prj.copy()
In [28]:
# from sklearn.cluster import DBSCAN
# clust = DBSCAN(eps=0.07, min_samples=100)

import hdbscan
clust = hdbscan.HDBSCAN(min_cluster_size=200, min_samples=10)


leicester_emb_df_clust = leicester_emb_patters_df[["EMB000", "EMB001"]].dropna()
leicester_emb_patters_df["clust"] = clust.fit_predict(leicester_emb_df_clust)
leicester_emb_patters_df["clust"].nunique()
Out[28]:
8
In [29]:
clust_sizes = leicester_emb_patters_df[leicester_emb_patters_df['clust']>-1]['clust'].value_counts()
clust_mapping = {old: new for new, old in enumerate(clust_sizes.index, start=0)}
clust_mapping.update({-1: -1})
leicester_emb_patters_df['clust'] = leicester_emb_patters_df['clust'].map(clust_mapping)

print(leicester_emb_patters_df.groupby('clust').size().reset_index(name='counts'))
   clust  counts
0     -1    1652
1      0    6723
2      1    1410
3      2     897
4      3     782
5      4     656
6      5     480
7      6     384
In [30]:
colorbrewer_set1 = ["#377eb8", "#e41a1c", "#4daf4a", "#984ea3", "#ff7f00", "#ffff33", "#a65628", "#f781bf", "#999999"]
colorbrewer_paired12 = ["#a6cee3", "#1f78b4", "#b2df8a", "#33a02c", "#fb9a99", "#e31a1c", "#fdbf6f", "#ff7f00", "#cab2d6", "#6a3d9a", "#ffff99", "#b15928", "#cccccc"]
leicester_emb_patters_df["clust_colour"] = leicester_emb_patters_df["clust"].apply(lambda x: colorbrewer_set1[x])
leicester_emb_patters_df.head()
Out[30]:
osmnx_node_id EMB000 EMB001 EMB_dist EMB_angl EMB_colr EMB000pooled EMB001pooled clust clust_colour
0 337976 0.700673 -0.058294 0.000000 -0.082911 #5ec962 0.929014 -0.045372 -1 #999999
1 337979 1.052401 -0.071909 0.406472 -0.068169 #5ec962 0.911989 -0.045298 2 #4daf4a
2 337983 1.176129 -0.014825 0.568296 -0.012604 #5ec962 0.929369 -0.041823 2 #4daf4a
3 337985 1.200868 0.031910 0.601723 0.026563 #28ae80 0.930489 -0.040748 2 #4daf4a
4 337986 0.967397 0.003360 0.289871 0.003474 #28ae80 0.929369 -0.041823 5 #ffff33
In [31]:
plt.figure(figsize=(7,7))
ax = plt.axes()
ax.set_facecolor("white")
plt.scatter(
    x=leicester_emb_patters_df.EMB000,
    y=leicester_emb_patters_df.EMB001,
    c=leicester_emb_patters_df.clust_colour,
    s=5, edgecolors='black', linewidth=0.1)
plt.xlabel("Embeddings first dimension")
plt.ylabel("Embeddings second dimension")
plt.show()
No description has been provided for this image
In [32]:
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=leicester_emb_patters_df.EMB000,
    y=leicester_emb_patters_df.EMB001,
    mode='markers',
    marker=dict(color=leicester_emb_patters_df.clust_colour)
))
fig.update_layout({"plot_bgcolor": "#ffffff"}, width=800, height=800)
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='#cccccc', zeroline=True, zerolinewidth=1, zerolinecolor='#cccccc')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='#cccccc', zeroline=True, zerolinewidth=1, zerolinecolor='#cccccc')
fig.show()
In [33]:
for node in leicester_osmnx_patters.nodes:
    node_bivariate_colour = leicester_emb_patters_df.loc[leicester_emb_patters_df["osmnx_node_id"] == node]
    if node_bivariate_colour.empty:
        leicester_osmnx_patters.nodes[node]["clust_colour"] = "#000000"
        leicester_osmnx_patters.nodes[node]["node_size"] = 1
    else:
        leicester_osmnx_patters.nodes[node]["clust_colour"] = node_bivariate_colour["clust_colour"].values[0]
        leicester_osmnx_patters.nodes[node]["node_size"] = 7
In [34]:
ox.plot_graph(
    leicester_osmnx_patters,
    node_color=[leicester_osmnx_patters.nodes[node]["clust_colour"] for node in leicester_osmnx_patters.nodes],
    node_size=[leicester_osmnx_patters.nodes[node]["node_size"] if leicester_osmnx_patters.nodes[node]["clust_colour"]!=colorbrewer_set1[-1] else 1 for node in leicester_osmnx_patters.nodes],
    bgcolor="#ffffff", edge_color="#000000", edge_linewidth=0.1,
    figsize=(12, 12))
No description has been provided for this image
Out[34]:
(<Figure size 1200x1200 with 1 Axes>, <Axes: >)

Prepare geopandas dataframe for interactive maps.

In [35]:
leicester_gdf = gpd.GeoDataFrame(
    leicester_osmnx_graph_prj_df,
    geometry=gpd.points_from_xy(
        leicester_osmnx_graph_prj_df.lon,
        leicester_osmnx_graph_prj_df.lat
    ),
    crs="EPSG:4326"
).merge(leicester_emb_patters_df, on='osmnx_node_id', how='left')
leicester_gdf.head()
Out[35]:
y x street_count elevation elevation_aster elevation_srtm lon lat osmnx_node_id ref ... geometry EMB000 EMB001 EMB_dist EMB_angl EMB_colr EMB000pooled EMB001pooled clust clust_colour
0 5.829804e+06 622151.977595 3 72.0 35 72 -1.196195 52.604506 194739 NaN ... POINT (-1.1962 52.60451) NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 5.829991e+06 622098.041002 3 72.0 45 72 -1.196922 52.606196 1551014281 NaN ... POINT (-1.19692 52.6062) NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 5.828827e+06 622259.813792 2 79.0 57 79 -1.194965 52.595696 326312 21 ... POINT (-1.19496 52.5957) NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 5.830107e+06 622077.742140 3 79.0 43 79 -1.197179 52.607245 326320 21 ... POINT (-1.19718 52.60724) NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 5.829673e+06 622220.645785 3 74.0 35 74 -1.195230 52.603314 2627867454 NaN ... POINT (-1.19523 52.60331) 1.00017 -0.058451 0.335835 -0.058341 #5ec962 0.946935 -0.034518 5.0 #ffff33

5 rows × 21 columns

Interactive clusters map ¶

In [36]:
leicester_gdf[leicester_gdf["clust_colour"]!=colorbrewer_set1[-1]].dropna(subset=["EMB000"]).explore(
    color="clust_colour",
    marker_kwds={"radius": 7}, style_kwds={"stroke": False},
    tiles="CartoDB positron"
)
Out[36]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Bivariate quantiles (pooled embeddings) ¶

We explore the three quantiles of each pooled embedding dimension as a 3x3 bivariate colour scheme which is then replicated in the maps further below to illustrate how the bivariate quantiles are spatially distributed in Leicester.

In [37]:
# This functions assigns a bivariate colour based on three quantiles
def bivariate_colour(x, limits):
    if x[0] is None or x[1] is None:
        return None
    else:
        if x[0] <= limits[0, 0]:
            if x[1] <= limits[1, 0]:
                # return "#e8e8e8"
                return "#e8e8e8"
            elif x[1] <= limits[1, 1]:
                # return "#cbb8d7"
                return "#e4acac"
            else:
                # return "#9972af"
                return "#c85a5a"
        if x[0] <= limits[0, 1]:
            if x[1] <= limits[1, 0]:
                # return "#e4d9ac"
                return "#b0d5df"
            elif x[1] <= limits[1, 1]:
                # return "#c8ada0"
                return "#ad9ea5"
            else:
                # return "#976b82"
                return "#985356"
        else:
            if x[1] <= limits[1, 0]:
                # return "#c8b35a"
                return "#64acbe"
            elif x[1] <= limits[1, 1]:
                # return "#af8e53"
                return "#627f8c"
            else:
                # return "#804d36"
                return "#574249"
In [38]:
leicester_emb_pooled_quantiles = leicester_emb_patters_df[["EMB000pooled", "EMB001pooled"]].quantile([1/3, 2/3]).values.transpose()
leicester_emb_patters_df["bivariate_colour_pooled"] = leicester_emb_patters_df.apply(
    lambda x: bivariate_colour([x["EMB000pooled"], x["EMB001pooled"]], 
        leicester_emb_pooled_quantiles
        ), axis=1
)
leicester_emb_patters_df.head()
Out[38]:
osmnx_node_id EMB000 EMB001 EMB_dist EMB_angl EMB_colr EMB000pooled EMB001pooled clust clust_colour bivariate_colour_pooled
0 337976 0.700673 -0.058294 0.000000 -0.082911 #5ec962 0.929014 -0.045372 -1 #999999 #627f8c
1 337979 1.052401 -0.071909 0.406472 -0.068169 #5ec962 0.911989 -0.045298 2 #4daf4a #627f8c
2 337983 1.176129 -0.014825 0.568296 -0.012604 #5ec962 0.929369 -0.041823 2 #4daf4a #627f8c
3 337985 1.200868 0.031910 0.601723 0.026563 #28ae80 0.930489 -0.040748 2 #4daf4a #627f8c
4 337986 0.967397 0.003360 0.289871 0.003474 #28ae80 0.929369 -0.041823 5 #ffff33 #627f8c
In [39]:
plt.figure(figsize=(7,7))
ax = plt.axes()
ax.set_facecolor("white")
plt.scatter(
    x=leicester_emb_patters_df.EMB000pooled,
    y=leicester_emb_patters_df.EMB001pooled,
    c=leicester_emb_patters_df.bivariate_colour_pooled,
    s=10, edgecolors='black', linewidth=0.1)
plt.xlabel("Pooled embeddings first dimension")
plt.ylabel("Pooled embeddings second dimension")
plt.show()
No description has been provided for this image
In [40]:
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=leicester_emb_patters_df.EMB000pooled,
    y=leicester_emb_patters_df.EMB001pooled,
    mode='markers',
    marker=dict(color=leicester_emb_patters_df.bivariate_colour_pooled)
))
fig.update_layout({"plot_bgcolor": "#ffffff"}, width=800, height=800)
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='#cccccc', zeroline=True, zerolinewidth=1, zerolinecolor='#cccccc')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='#cccccc', zeroline=True, zerolinewidth=1, zerolinecolor='#cccccc')
fig.show()
In [41]:
for node in leicester_osmnx_patters.nodes:
    node_bivariate_colour = leicester_emb_patters_df.loc[leicester_emb_patters_df["osmnx_node_id"] == node]
    if node_bivariate_colour.empty:
        leicester_osmnx_patters.nodes[node]["bivariate_colour_pooled"] = "#000000"
        leicester_osmnx_patters.nodes[node]["node_size"] = 1
    else:
        leicester_osmnx_patters.nodes[node]["bivariate_colour_pooled"] = node_bivariate_colour["bivariate_colour_pooled"].values[0]
        leicester_osmnx_patters.nodes[node]["node_size"] = 7
In [42]:
ox.plot_graph(
    leicester_osmnx_patters,
    node_color=[leicester_osmnx_patters.nodes[node]["bivariate_colour_pooled"] for node in leicester_osmnx_patters.nodes],
    node_size=[leicester_osmnx_patters.nodes[node]["node_size"] for node in leicester_osmnx_patters.nodes],
    bgcolor="#ffffff", edge_color="#000000", edge_linewidth=0.1,
    figsize=(12, 12))
No description has been provided for this image
Out[42]:
(<Figure size 1200x1200 with 1 Axes>, <Axes: >)

Interactive bivariate map ¶

In [43]:
del leicester_gdf
leicester_gdf = gpd.GeoDataFrame(
    leicester_osmnx_graph_prj_df,
    geometry=gpd.points_from_xy(
        leicester_osmnx_graph_prj_df.lon,
        leicester_osmnx_graph_prj_df.lat
    ),
    crs="EPSG:4326"
).merge(leicester_emb_patters_df, on='osmnx_node_id', how='left')
leicester_gdf.head()
Out[43]:
y x street_count elevation elevation_aster elevation_srtm lon lat osmnx_node_id ref ... EMB000 EMB001 EMB_dist EMB_angl EMB_colr EMB000pooled EMB001pooled clust clust_colour bivariate_colour_pooled
0 5.829804e+06 622151.977595 3 72.0 35 72 -1.196195 52.604506 194739 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 5.829991e+06 622098.041002 3 72.0 45 72 -1.196922 52.606196 1551014281 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 5.828827e+06 622259.813792 2 79.0 57 79 -1.194965 52.595696 326312 21 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 5.830107e+06 622077.742140 3 79.0 43 79 -1.197179 52.607245 326320 21 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 5.829673e+06 622220.645785 3 74.0 35 74 -1.195230 52.603314 2627867454 NaN ... 1.00017 -0.058451 0.335835 -0.058341 #5ec962 0.946935 -0.034518 5.0 #ffff33 #627f8c

5 rows × 22 columns

In [44]:
leicester_gdf[leicester_gdf["bivariate_colour_pooled"]!="#000000"].dropna(subset=["EMB000"]).explore(
    color="bivariate_colour_pooled",
    marker_kwds={"radius": 7}, style_kwds={"stroke": False},
    legend=True,
    tiles="CartoDB positron"
)
Out[44]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Classic measures ¶

In this section, we examine the correlations between node and ego-graph pooled embeddings, the OSMnx statistics for the nodes within the city-wide network, the nodes within their ego-graph used to create the embeddings, and the basic stats for the ego-graph used to create the embeddings

Correlations with node and ego-graph stats ¶

We start by comining the embeddings dataframe with the pre-computed statisctics. See osmnx_stats_node_centrality_with_egograph_Leicester.py and osmnx_stats_egograph_basic_Leicester.py for further details.

In [45]:
leicester_emb_stats_for_corr = \
    leicester_emb_df[["osmnx_node_id", "EMB000", "EMB001"]].merge(
    # Ego-graph pooled embeddings
        leicester_emb_pool_df.rename(columns={"EMB000":"EMB000pooled", "EMB001":"EMB001pooled"}),
        on="osmnx_node_id"
    ).merge(
    # Centrality including node-based and ego-graph-based
        pd.read_csv(this_repo_directory +
            "/data/leicester-1864_stats_node_centrality_with_egograph_dist500.csv"
            ).rename(columns={"node_id":"osmnx_node_id"}),
        on="osmnx_node_id"
    ).merge(
    # Ego-graph basic stats
        pd.read_csv(this_repo_directory +
            "/data/leicester-1864_stats_egograph_basic_dist500.csv"
            ).rename(columns={"node_id":"osmnx_node_id"}
            ).dropna(subset=["osmnx_node_id"])[
            ["osmnx_node_id","n", "m", "k_avg", "edge_length_total", "edge_length_avg",
            "streets_per_node_avg", "intersection_count", "street_length_total",
            "street_segment_count", "street_length_avg", "circuity_avg"]],
        on="osmnx_node_id"
    )

leicester_emb_stats_for_corr.head()
Out[45]:
osmnx_node_id EMB000 EMB001 EMB000pooled EMB001pooled closeness_networkwide betweenness_networkwide closeness_egograph betweenness_egograph n m k_avg edge_length_total edge_length_avg streets_per_node_avg intersection_count street_length_total street_segment_count street_length_avg circuity_avg
0 337976 0.700673 -0.058294 0.929014 -0.045372 0.000000 0.000000 0.000000 0.000000 11.0 11.0 2.0 1261.861 114.714636 3.0 11.0 1261.861 11.0 114.714636 1.038343
1 337979 1.052401 -0.071909 0.911989 -0.045298 0.000150 0.000149 0.166667 0.106061 13.0 13.0 2.0 2126.471 163.574692 3.0 13.0 2126.471 13.0 163.574692 1.030988
2 337983 1.176129 -0.014825 0.929369 -0.041823 0.000285 0.000298 0.230769 0.115385 14.0 14.0 2.0 1870.996 133.642571 3.0 14.0 1870.996 14.0 133.642571 1.048630
3 337985 1.200868 0.031910 0.930489 -0.040748 0.015656 0.000000 0.274725 0.000000 14.0 14.0 2.0 1815.929 129.709214 3.0 14.0 1815.929 14.0 129.709214 1.050192
4 337986 0.967397 0.003360 0.929369 -0.041823 0.000249 0.000373 0.198381 0.096154 14.0 14.0 2.0 1870.996 133.642571 3.0 14.0 1870.996 14.0 133.642571 1.048630

We can start from plotting all variables using a pair-plot.

In [46]:
sns.pairplot(leicester_emb_stats_for_corr.drop(columns=["osmnx_node_id"]), kind="hist")
Out[46]:
<seaborn.axisgrid.PairGrid at 0x73fe55defaa0>
No description has been provided for this image
In [47]:
print(leicester_emb_stats_for_corr.drop(columns=["osmnx_node_id"]).corr(method="kendall"))
                           EMB000    EMB001  EMB000pooled  EMB001pooled  \
EMB000                   1.000000  0.059030      0.429212      0.051662   
EMB001                   0.059030  1.000000      0.125722      0.472115   
EMB000pooled             0.429212  0.125722      1.000000      0.122273   
EMB001pooled             0.051662  0.472115      0.122273      1.000000   
closeness_networkwide    0.133685  0.248864      0.259853      0.309848   
betweenness_networkwide  0.105972  0.203050      0.188030      0.099579   
closeness_egograph      -0.157206  0.327209     -0.163216      0.409736   
betweenness_egograph    -0.022609  0.242346      0.066057      0.139539   
n                        0.002532 -0.126187      0.048531     -0.258320   
m                       -0.031767 -0.084654     -0.002091     -0.193847   
k_avg                   -0.178911  0.218204     -0.256573      0.305057   
edge_length_total        0.017077  0.134238      0.070389      0.107784   
edge_length_avg          0.091904  0.438932      0.133785      0.690188   
streets_per_node_avg     0.229514  0.255644      0.428647      0.353873   
intersection_count       0.063879 -0.037603      0.158493     -0.129153   
street_length_total      0.077487  0.117393      0.164908      0.084585   
street_segment_count     0.035430 -0.082717      0.107530     -0.191771   
street_length_avg        0.073897  0.427956      0.101066      0.671931   
circuity_avg            -0.075397  0.001206     -0.139517     -0.002846   

                         closeness_networkwide  betweenness_networkwide  \
EMB000                                0.133685                 0.105972   
EMB001                                0.248864                 0.203050   
EMB000pooled                          0.259853                 0.188030   
EMB001pooled                          0.309848                 0.099579   
closeness_networkwide                 1.000000                 0.245741   
betweenness_networkwide               0.245741                 1.000000   
closeness_egograph                    0.060927                 0.037740   
betweenness_egograph                  0.146073                 0.666984   
n                                     0.047026                 0.268784   
m                                     0.058577                 0.266006   
k_avg                                 0.090216                 0.047317   
edge_length_total                     0.243250                 0.369374   
edge_length_avg                       0.308200                 0.140604   
streets_per_node_avg                  0.448797                 0.226158   
intersection_count                    0.157044                 0.319659   
street_length_total                   0.284100                 0.390023   
street_segment_count                  0.107291                 0.295211   
street_length_avg                     0.301521                 0.126597   
circuity_avg                         -0.116848                -0.028826   

                         closeness_egograph  betweenness_egograph         n  \
EMB000                            -0.157206             -0.022609  0.002532   
EMB001                             0.327209              0.242346 -0.126187   
EMB000pooled                      -0.163216              0.066057  0.048531   
EMB001pooled                       0.409736              0.139539 -0.258320   
closeness_networkwide              0.060927              0.146073  0.047026   
betweenness_networkwide            0.037740              0.666984  0.268784   
closeness_egograph                 1.000000              0.205350 -0.452824   
betweenness_egograph               0.205350              1.000000  0.145905   
n                                 -0.452824              0.145905  1.000000   
m                                 -0.367824              0.157868  0.891733   
k_avg                              0.323223              0.100583  0.013712   
edge_length_total                 -0.146388              0.270640  0.589867   
edge_length_avg                    0.443310              0.178665 -0.296360   
streets_per_node_avg              -0.037485              0.092405  0.097850   
intersection_count                -0.420096              0.171545  0.826980   
street_length_total               -0.219952              0.256335  0.617244   
street_segment_count              -0.440343              0.155661  0.902224   
street_length_avg                  0.460244              0.169671 -0.292572   
circuity_avg                       0.082450              0.038269 -0.118638   

                                m     k_avg  edge_length_total  \
EMB000                  -0.031767 -0.178911           0.017077   
EMB001                  -0.084654  0.218204           0.134238   
EMB000pooled            -0.002091 -0.256573           0.070389   
EMB001pooled            -0.193847  0.305057           0.107784   
closeness_networkwide    0.058577  0.090216           0.243250   
betweenness_networkwide  0.266006  0.047317           0.369374   
closeness_egograph      -0.367824  0.323223          -0.146388   
betweenness_egograph     0.157868  0.100583           0.270640   
n                        0.891733  0.013712           0.589867   
m                        1.000000  0.132379           0.665304   
k_avg                    0.132379  1.000000           0.305397   
edge_length_total        0.665304  0.305397           1.000000   
edge_length_avg         -0.234006  0.292995           0.104432   
streets_per_node_avg     0.117131  0.151434           0.321869   
intersection_count       0.810128  0.057609           0.674507   
street_length_total      0.656864  0.211502           0.879629   
street_segment_count     0.878092  0.040801           0.635415   
street_length_avg       -0.224905  0.323991           0.108293   
circuity_avg            -0.118920 -0.000375          -0.089911   

                         edge_length_avg  streets_per_node_avg  \
EMB000                          0.091904              0.229514   
EMB001                          0.438932              0.255644   
EMB000pooled                    0.133785              0.428647   
EMB001pooled                    0.690188              0.353873   
closeness_networkwide           0.308200              0.448797   
betweenness_networkwide         0.140604              0.226158   
closeness_egograph              0.443310             -0.037485   
betweenness_egograph            0.178665              0.092405   
n                              -0.296360              0.097850   
m                              -0.234006              0.117131   
k_avg                           0.292995              0.151434   
edge_length_total               0.104432              0.321869   
edge_length_avg                 1.000000              0.340156   
streets_per_node_avg            0.340156              1.000000   
intersection_count             -0.166574              0.279544   
street_length_total             0.080524              0.389084   
street_segment_count           -0.232123              0.197164   
street_length_avg               0.904994              0.316873   
circuity_avg                    0.048389             -0.153925   

                         intersection_count  street_length_total  \
EMB000                             0.063879             0.077487   
EMB001                            -0.037603             0.117393   
EMB000pooled                       0.158493             0.164908   
EMB001pooled                      -0.129153             0.084585   
closeness_networkwide              0.157044             0.284100   
betweenness_networkwide            0.319659             0.390023   
closeness_egograph                -0.420096            -0.219952   
betweenness_egograph               0.171545             0.256335   
n                                  0.826980             0.617244   
m                                  0.810128             0.656864   
k_avg                              0.057609             0.211502   
edge_length_total                  0.674507             0.879629   
edge_length_avg                   -0.166574             0.080524   
streets_per_node_avg               0.279544             0.389084   
intersection_count                 1.000000             0.739393   
street_length_total                0.739393             1.000000   
street_segment_count               0.910502             0.687454   
street_length_avg                 -0.170992             0.084041   
circuity_avg                      -0.143807            -0.106058   

                         street_segment_count  street_length_avg  circuity_avg  
EMB000                               0.035430           0.073897     -0.075397  
EMB001                              -0.082717           0.427956      0.001206  
EMB000pooled                         0.107530           0.101066     -0.139517  
EMB001pooled                        -0.191771           0.671931     -0.002846  
closeness_networkwide                0.107291           0.301521     -0.116848  
betweenness_networkwide              0.295211           0.126597     -0.028826  
closeness_egograph                  -0.440343           0.460244      0.082450  
betweenness_egograph                 0.155661           0.169671      0.038269  
n                                    0.902224          -0.292572     -0.118638  
m                                    0.878092          -0.224905     -0.118920  
k_avg                                0.040801           0.323991     -0.000375  
edge_length_total                    0.635415           0.108293     -0.089911  
edge_length_avg                     -0.232123           0.904994      0.048389  
streets_per_node_avg                 0.197164           0.316873     -0.153925  
intersection_count                   0.910502          -0.170992     -0.143807  
street_length_total                  0.687454           0.084041     -0.106058  
street_segment_count                 1.000000          -0.233779     -0.132164  
street_length_avg                   -0.233779           1.000000      0.043635  
circuity_avg                        -0.132164           0.043635      1.000000  

We can also double-check our results using Spearman's rho rank correlation, obtaining similar although slightly higher values as expected -- although the analysis above is more robust.

In [48]:
print(leicester_emb_stats_for_corr.drop(columns=["osmnx_node_id"]).corr(method="spearman"))
                           EMB000    EMB001  EMB000pooled  EMB001pooled  \
EMB000                   1.000000  0.080694      0.609703      0.084052   
EMB001                   0.080694  1.000000      0.199642      0.668468   
EMB000pooled             0.609703  0.199642      1.000000      0.200268   
EMB001pooled             0.084052  0.668468      0.200268      1.000000   
closeness_networkwide    0.204988  0.369898      0.394431      0.455986   
betweenness_networkwide  0.169743  0.301660      0.269842      0.146314   
closeness_egograph      -0.239776  0.487068     -0.241071      0.590862   
betweenness_egograph    -0.035738  0.356393      0.098773      0.199885   
n                        0.008290 -0.191902      0.069534     -0.378356   
m                       -0.044475 -0.129282     -0.005572     -0.287588   
k_avg                   -0.264504  0.334940     -0.369608      0.461683   
edge_length_total        0.026922  0.206463      0.103275      0.163345   
edge_length_avg          0.142144  0.626846      0.208853      0.875503   
streets_per_node_avg     0.343967  0.385186      0.610400      0.517214   
intersection_count       0.098779 -0.056143      0.230579     -0.189266   
street_length_total      0.120179  0.181846      0.243166      0.130509   
street_segment_count     0.057999 -0.125923      0.156997     -0.284130   
street_length_avg        0.114555  0.612067      0.159128      0.861252   
circuity_avg            -0.114095  0.000933     -0.207437     -0.006560   

                         closeness_networkwide  betweenness_networkwide  \
EMB000                                0.204988                 0.169743   
EMB001                                0.369898                 0.301660   
EMB000pooled                          0.394431                 0.269842   
EMB001pooled                          0.455986                 0.146314   
closeness_networkwide                 1.000000                 0.350715   
betweenness_networkwide               0.350715                 1.000000   
closeness_egograph                    0.087531                 0.051907   
betweenness_egograph                  0.212391                 0.835012   
n                                     0.069174                 0.382218   
m                                     0.087432                 0.380044   
k_avg                                 0.129970                 0.066617   
edge_length_total                     0.360226                 0.529066   
edge_length_avg                       0.456250                 0.204010   
streets_per_node_avg                  0.641996                 0.326195   
intersection_count                    0.233107                 0.453933   
street_length_total                   0.418098                 0.556922   
street_segment_count                  0.159911                 0.419911   
street_length_avg                     0.448151                 0.184167   
circuity_avg                         -0.176367                -0.041270   

                         closeness_egograph  betweenness_egograph         n  \
EMB000                            -0.239776             -0.035738  0.008290   
EMB001                             0.487068              0.356393 -0.191902   
EMB000pooled                      -0.241071              0.098773  0.069534   
EMB001pooled                       0.590862              0.199885 -0.378356   
closeness_networkwide              0.087531              0.212391  0.069174   
betweenness_networkwide            0.051907              0.835012  0.382218   
closeness_egograph                 1.000000              0.297919 -0.629959   
betweenness_egograph               0.297919              1.000000  0.213167   
n                                 -0.629959              0.213167  1.000000   
m                                 -0.523668              0.230822  0.980338   
k_avg                              0.458938              0.152805  0.015981   
edge_length_total                 -0.216647              0.397477  0.784142   
edge_length_avg                    0.620514              0.255275 -0.425638   
streets_per_node_avg              -0.060167              0.143119  0.144189   
intersection_count                -0.590339              0.251619  0.953861   
street_length_total               -0.322573              0.381201  0.806319   
street_segment_count              -0.616406              0.228824  0.983940   
street_length_avg                  0.640899              0.242273 -0.420878   
circuity_avg                       0.123876              0.054817 -0.173852   

                                m     k_avg  edge_length_total  \
EMB000                  -0.044475 -0.264504           0.026922   
EMB001                  -0.129282  0.334940           0.206463   
EMB000pooled            -0.005572 -0.369608           0.103275   
EMB001pooled            -0.287588  0.461683           0.163345   
closeness_networkwide    0.087432  0.129970           0.360226   
betweenness_networkwide  0.380044  0.066617           0.529066   
closeness_egograph      -0.523668  0.458938          -0.216647   
betweenness_egograph     0.230822  0.152805           0.397477   
n                        0.980338  0.015981           0.784142   
m                        1.000000  0.188244           0.851935   
k_avg                    0.188244  1.000000           0.432809   
edge_length_total        0.851935  0.432809           1.000000   
edge_length_avg         -0.341298  0.432676           0.160137   
streets_per_node_avg     0.172351  0.206537           0.464561   
intersection_count       0.944876  0.071353           0.857755   
street_length_total      0.842110  0.292745           0.972294   
street_segment_count     0.971088  0.050415           0.827154   
street_length_avg       -0.328617  0.474856           0.165242   
circuity_avg            -0.173620 -0.001432          -0.132332   

                         edge_length_avg  streets_per_node_avg  \
EMB000                          0.142144              0.343967   
EMB001                          0.626846              0.385186   
EMB000pooled                    0.208853              0.610400   
EMB001pooled                    0.875503              0.517214   
closeness_networkwide           0.456250              0.641996   
betweenness_networkwide         0.204010              0.326195   
closeness_egograph              0.620514             -0.060167   
betweenness_egograph            0.255275              0.143119   
n                              -0.425638              0.144189   
m                              -0.341298              0.172351   
k_avg                           0.432676              0.206537   
edge_length_total               0.160137              0.464561   
edge_length_avg                 1.000000              0.499462   
streets_per_node_avg            0.499462              1.000000   
intersection_count             -0.240841              0.406843   
street_length_total             0.128629              0.558292   
street_segment_count           -0.338067              0.291215   
street_length_avg               0.983105              0.469362   
circuity_avg                    0.067899             -0.230816   

                         intersection_count  street_length_total  \
EMB000                             0.098779             0.120179   
EMB001                            -0.056143             0.181846   
EMB000pooled                       0.230579             0.243166   
EMB001pooled                      -0.189266             0.130509   
closeness_networkwide              0.233107             0.418098   
betweenness_networkwide            0.453933             0.556922   
closeness_egograph                -0.590339            -0.322573   
betweenness_egograph               0.251619             0.381201   
n                                  0.953861             0.806319   
m                                  0.944876             0.842110   
k_avg                              0.071353             0.292745   
edge_length_total                  0.857755             0.972294   
edge_length_avg                   -0.240841             0.128629   
streets_per_node_avg               0.406843             0.558292   
intersection_count                 1.000000             0.904625   
street_length_total                0.904625             1.000000   
street_segment_count               0.985890             0.867512   
street_length_avg                 -0.248148             0.133033   
circuity_avg                      -0.209816            -0.156784   

                         street_segment_count  street_length_avg  circuity_avg  
EMB000                               0.057999           0.114555     -0.114095  
EMB001                              -0.125923           0.612067      0.000933  
EMB000pooled                         0.156997           0.159128     -0.207437  
EMB001pooled                        -0.284130           0.861252     -0.006560  
closeness_networkwide                0.159911           0.448151     -0.176367  
betweenness_networkwide              0.419911           0.184167     -0.041270  
closeness_egograph                  -0.616406           0.640899      0.123876  
betweenness_egograph                 0.228824           0.242273      0.054817  
n                                    0.983940          -0.420878     -0.173852  
m                                    0.971088          -0.328617     -0.173620  
k_avg                                0.050415           0.474856     -0.001432  
edge_length_total                    0.827154           0.165242     -0.132332  
edge_length_avg                     -0.338067           0.983105      0.067899  
streets_per_node_avg                 0.291215           0.469362     -0.230816  
intersection_count                   0.985890          -0.248148     -0.209816  
street_length_total                  0.867512           0.133033     -0.156784  
street_segment_count                 1.000000          -0.340965     -0.192780  
street_length_avg                   -0.340965           1.000000      0.060435  
circuity_avg                        -0.192780           0.060435      1.000000  

We export the full table including the embeddings and the statistics to create an additional visualisation as a pair plot in R -- see this quarto document.

In [49]:
leicester_emb_stats_for_corr.to_csv(this_repo_directory + "/data/leicester-1864_emb_gnnuf_model_v0-12_incl-pool-with-stats.csv", index=False)

Plots ¶

In this section we illustrate how the city-wide and ego-graph-based measures of centrality are spatially distributed, as a point of comparison for the maps above. In particular, we illustrate closeness and betweenness centrality using a bivariate scheme, similarly to what done above for the two embeddings.

We also include in the comparison the As indicated the average number of streets per node and the average street lengths (taking the opposite value in order to better align the values and colours below to the embedding plots and maps) in the ego-graph, which are the values showing the highest correlation in the analysis above.

In [50]:
leicester_osmnx_centrality = leicester_osmnx_graph_prj.copy()
In [51]:
leicester_centralities_networkwide_quantiles = leicester_emb_stats_for_corr[["closeness_networkwide", "betweenness_networkwide"]].quantile([1/3, 2/3]).values.transpose()
leicester_emb_stats_for_corr["bivariate_centrality_networkwide"] = leicester_emb_stats_for_corr.apply(
    lambda x: bivariate_colour([x["closeness_networkwide"], x["betweenness_networkwide"]], leicester_centralities_networkwide_quantiles), axis=1
)
In [52]:
leicester_centralities_egograph_quantiles = leicester_emb_stats_for_corr[["closeness_egograph", "betweenness_egograph"]].quantile([1/3, 2/3]).values.transpose()
leicester_emb_stats_for_corr["bivariate_centrality_egograph"] = leicester_emb_stats_for_corr.apply(
    lambda x: bivariate_colour([x["closeness_egograph"], x["betweenness_egograph"]], leicester_centralities_egograph_quantiles), axis=1
)
In [53]:
leicester_emb_stats_for_corr["street_length_avg_opp"] = leicester_emb_stats_for_corr["street_length_avg"]*(-1)
leicester_streets_egograph_quantiles = leicester_emb_stats_for_corr[["streets_per_node_avg", "street_length_avg_opp"]].quantile([1/3, 2/3]).values.transpose()
leicester_emb_stats_for_corr["streets_landc_egograph"] = leicester_emb_stats_for_corr.apply(
    lambda x: bivariate_colour([x["streets_per_node_avg"], x["street_length_avg_opp"]], leicester_streets_egograph_quantiles), axis=1
)
In [54]:
for node in leicester_osmnx_centrality.nodes:
    # networkwide
    leicester_osmnx_centrality.nodes[node]["closeness_networkwide"] = None
    leicester_osmnx_centrality.nodes[node]["betweenness_networkwide"] = None
    leicester_osmnx_centrality.nodes[node]["bivariate_centrality_networkwide"] = "#000000"
    # egograph
    leicester_osmnx_centrality.nodes[node]["closeness_egograph"] = None
    leicester_osmnx_centrality.nodes[node]["betweenness_egograph"] = None
    leicester_osmnx_centrality.nodes[node]["bivariate_centrality_egograph"] = "#000000"
    # streets length and count
    leicester_osmnx_centrality.nodes[node]["streets_per_node_avg"] = None
    leicester_osmnx_centrality.nodes[node]["street_length_avg_opp"] = None
    leicester_osmnx_centrality.nodes[node]["streets_landc_egograph"] = "#000000"
    if node in leicester_emb_stats_for_corr["osmnx_node_id"].values:
        # networkwide
        leicester_osmnx_centrality.nodes[node]["closeness_networkwide"] = leicester_emb_stats_for_corr.loc[
            leicester_emb_stats_for_corr["osmnx_node_id"]==node, "closeness_networkwide"].values[0]
        leicester_osmnx_centrality.nodes[node]["betweenness_networkwide"] = leicester_emb_stats_for_corr.loc[
            leicester_emb_stats_for_corr["osmnx_node_id"]==node, "betweenness_networkwide"].values[0]
        leicester_osmnx_centrality.nodes[node]["bivariate_centrality_networkwide"] = leicester_emb_stats_for_corr.loc[
            leicester_emb_stats_for_corr["osmnx_node_id"]==node, "bivariate_centrality_networkwide"].values[0]
        # egograph
        leicester_osmnx_centrality.nodes[node]["closeness_egograph"] = leicester_emb_stats_for_corr.loc[
            leicester_emb_stats_for_corr["osmnx_node_id"]==node, "closeness_egograph"].values[0]
        leicester_osmnx_centrality.nodes[node]["betweenness_egograph"] = leicester_emb_stats_for_corr.loc[
            leicester_emb_stats_for_corr["osmnx_node_id"]==node, "betweenness_egograph"].values[0]
        leicester_osmnx_centrality.nodes[node]["bivariate_centrality_egograph"] = leicester_emb_stats_for_corr.loc[
            leicester_emb_stats_for_corr["osmnx_node_id"]==node, "bivariate_centrality_egograph"].values[0]
        # streets length and count
        leicester_osmnx_centrality.nodes[node]["streets_per_node_avg"] = leicester_emb_stats_for_corr.loc[
            leicester_emb_stats_for_corr["osmnx_node_id"]==node, "streets_per_node_avg"].values[0]
        leicester_osmnx_centrality.nodes[node]["street_length_avg_opp"] = leicester_emb_stats_for_corr.loc[
            leicester_emb_stats_for_corr["osmnx_node_id"]==node, "street_length_avg_opp"].values[0]
        leicester_osmnx_centrality.nodes[node]["streets_landc_egograph"] = leicester_emb_stats_for_corr.loc[
            leicester_emb_stats_for_corr["osmnx_node_id"]==node, "streets_landc_egograph"].values[0]

City-wide centrality ¶

In [55]:
plt.figure(figsize=(7,7))
ax = plt.axes()
ax.set_facecolor("white")
ax.set_yscale('log')
plt.scatter(
    x=leicester_emb_stats_for_corr.closeness_networkwide,
    y=leicester_emb_stats_for_corr.betweenness_networkwide,
    c=leicester_emb_stats_for_corr.bivariate_centrality_networkwide,
    s=10, edgecolors='black', linewidth=0.1)
plt.xlabel("closeness_networkwide")
plt.ylabel("betweenness_networkwide")
plt.show()
No description has been provided for this image
In [56]:
ox.plot_graph(
    leicester_osmnx_centrality,
    node_color=[leicester_osmnx_centrality.nodes[node]["bivariate_centrality_networkwide"] for node in leicester_osmnx_centrality.nodes],
    node_size=[1 if leicester_osmnx_centrality.nodes[node]["bivariate_centrality_networkwide"]=="#000000" else 7 for node in leicester_osmnx_centrality.nodes],
    bgcolor="#ffffff", edge_color="#000000", edge_linewidth=0.1,
    figsize=(12, 12))
No description has been provided for this image
Out[56]:
(<Figure size 1200x1200 with 1 Axes>, <Axes: >)

Ego-graph centrality ¶

In [57]:
plt.figure(figsize=(7,7))
ax = plt.axes()
ax.set_facecolor("white")
ax.set_yscale('log')
plt.scatter(
    x=leicester_emb_stats_for_corr.closeness_egograph,
    y=leicester_emb_stats_for_corr.betweenness_egograph,
    c=leicester_emb_stats_for_corr.bivariate_centrality_egograph,
    s=10, edgecolors='black', linewidth=0.1)
plt.xlabel("closeness_egograph")
plt.ylabel("betweenness_egograph")
plt.show()
No description has been provided for this image
In [58]:
ox.plot_graph(
    leicester_osmnx_centrality,
    node_color=[leicester_osmnx_centrality.nodes[node]["bivariate_centrality_egograph"] for node in leicester_osmnx_centrality.nodes],
    node_size=[1 if leicester_osmnx_centrality.nodes[node]["bivariate_centrality_egograph"]=="#000000" else 7 for node in leicester_osmnx_centrality.nodes],
    bgcolor="#ffffff", edge_color="#000000", edge_linewidth=0.1,
    figsize=(12, 12))
No description has been provided for this image
Out[58]:
(<Figure size 1200x1200 with 1 Axes>, <Axes: >)

Street length and count ¶

In [59]:
plt.figure(figsize=(7,7))
ax = plt.axes()
ax.set_facecolor("white")
plt.scatter(
    x=leicester_emb_stats_for_corr.streets_per_node_avg,
    y=leicester_emb_stats_for_corr.street_length_avg_opp,
    c=leicester_emb_stats_for_corr.streets_landc_egograph,
    s=10, edgecolors='black', linewidth=0.1)
plt.xlabel("streets_per_node_avg")
plt.ylabel("street_length_avg_opp")
plt.show()
No description has been provided for this image
In [60]:
ox.plot_graph(
    leicester_osmnx_centrality,
    node_color=[leicester_osmnx_centrality.nodes[node]["streets_landc_egograph"] for node in leicester_osmnx_centrality.nodes],
    node_size=[1 if leicester_osmnx_centrality.nodes[node]["streets_landc_egograph"]=="#000000" else 7 for node in leicester_osmnx_centrality.nodes],
    bgcolor="#ffffff", edge_color="#000000", edge_linewidth=0.1,
    figsize=(12, 12))
No description has been provided for this image
Out[60]:
(<Figure size 1200x1200 with 1 Axes>, <Axes: >)

As indicated by the correlation scores, the map above illustrating the number of streets per node and the average street lengths (taking the opposite value) shows the closest distribution to the pooled embeddings map, although with some differences, especially in the city centre. The difference with the node embedding map is more stark, as also illustrated by the correlation values.

In [ ]: