Tutorial#

This tutorial presents some of the main features of the package. To illustrate the usage, we analyse a selection of subway networks of well known cities.

Before we start, let us import all necessary packages:

[1]:
import matplotlib.pyplot as plt
import networkx as nx
import sys
import os
import numpy as np
from importlib import reload
from tabulate import tabulate
from IPython.display import HTML, display
import pandas as pd

import gdMetriX

We begin by loading the data from the datasets module, which provides access to the 'subways' dataset. The first time, this may take some time as the dataset has to be downloaded first. Afterwards, the data is cached locally.

[2]:
subways = list(gdMetriX.iterate_dataset('subways'))

To see what we are dealing with, we first draw the graphs using matplotlib and networkX.

[3]:
plt.figure(figsize=(30, 12))
i = 0
for name, graph in subways:
    # Setup the matplotlib axis
    ax = plt.subplot(3, 5, i + 1)
    ax.set_title(name)

    # Read the node positions from the graph
    pos = gdMetriX.get_node_positions(graph)

    # Draw on the axis using networkX
    nx.draw_networkx_edges(graph, pos, ax=ax)
    nx.draw_networkx_nodes(graph, pos, ax=ax, node_size=25)

    i += 1
plt.show()

../_images/notebooks_tutorial_5_0.png

Crossings#

To begin with, we might want to determine where different lines cross each other without a transit station. To do so, we have to find all edge crossings, which can be achieved as follows.

[4]:
plt.figure(figsize=(30, 12))
i = 0
for name, graph in subways:
    # Obtain a list of all crossings
    crossing_list = gdMetriX.get_crossings(graph)

    # Draw the graph first
    ax = plt.subplot(3, 5, i + 1)
    ax.set_title(name)
    pos = gdMetriX.get_node_positions(graph)
    nx.draw_networkx_edges(graph, pos, ax=ax)
    nx.draw_networkx_nodes(graph, pos, ax=ax, node_size=25)

    # Remove all crossing lines (we assume those to represent the same train track set)
    crossing_list = list(filter(lambda x: type(x.pos) is gdMetriX.crossingDataTypes.CrossingPoint, crossing_list))

    # Draw the crossings as red dots
    x_values = [point.pos.x for point in crossing_list]
    y_values = [point.pos.y for point in crossing_list]
    ax.plot(x_values, y_values, 'ro', ms=12)

    i += 1
plt.show()
../_images/notebooks_tutorial_7_0.png

Global attributes#

Next, we create a list of global metrics for all subway networks. Refer to the documentation for a complete list of available metrics.

[5]:
results = [
    ["Name", "n", "m", "Visual Symmetry", "Area", "Tight area", "Aspect ratio", "Height", "Width", "Center of mass", "Closest elements", "Closest points", "Concentration", "Gabriel ratio", "Homogeneity", "Horizontal balance", "Vertical Balance"]
]

for name, g in subways:
    graph_list = [name, g.order(), len(g.edges()), gdMetriX.visual_symmetry(g),
                  gdMetriX.area(g), gdMetriX.area_tight(g), gdMetriX.aspect_ratio(g),
                  gdMetriX.height(g), gdMetriX.width(g), gdMetriX.center_of_mass(g),
                  gdMetriX.closest_pair_of_elements(g)[2], gdMetriX.closet_pair_of_points(g)[2], gdMetriX.concentration(g),
                  gdMetriX.gabriel_ratio(g), gdMetriX.homogeneity(g), gdMetriX.horizontal_balance(g),
                  gdMetriX.vertical_balance(g)]

    results.append(graph_list)

df = pd.DataFrame(results)
display(HTML(df.to_html()))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0 Name n m Visual Symmetry Area Tight area Aspect ratio Bounding box Height Width Center of mass Closest elements Closest points Concentration Gabriel ratio Homogeneity Horizontal balance Vertical Balance
1 Barcelona 128 286 0.707802 0.0259 0.01177 0.554745 (41.343333, 2.0353131, 41.4632, 2.251389) 0.216076 0.119867 v(41.400485598437506,2.15647111171875) 0.000405 0.000424 0.535433 0.997946 1.0 0.3125 -0.125
2 Beijing 104 228 0.803643 0.120774 0.087965 0.467214 (39.8444, 116.171944, 40.081944, 116.6803702) 0.508426 0.237544 v(39.945206184615394,116.41587960384615) 0.00388 0.00388 0.592233 0.996345 0.999998 -0.307692 -0.326923
3 Berlin 170 364 0.760584 0.077929 0.051501 0.414477 (52.416111, 13.200278, 52.595833, 13.633889) 0.433611 0.179722 v(52.50868810588238,13.37589586470588) 0.001882 0.002515 0.597633 0.999084 1.0 -0.517647 0.094118
4 Chicago 141 298 0.723793 0.103609 0.053603 0.840046 (41.722596, -87.900876, 42.07379, -87.605857) 0.295019 0.351194 v(41.89951913333332,-87.68078670000001) 0.0 0.000931 0.65 0.997369 1.0 0.77305 -0.219858
5 HongKong 82 170 0.556229 0.086499 0.059596 0.805013 (22.264372, 113.936616, 22.528253, 114.264413) 0.327797 0.263881 v(22.349040353658538,114.15831179268294) 0.000472 0.002814 0.592593 0.994706 1.0 0.707317 -0.585366
6 London 266 308 0.694677 0.261322 0.178695 0.351455 (-0.611389, 51.402222, 0.2509, 51.705278) 0.303056 0.862289 v(-0.17365018421052628,51.53866646616539) 0.0 0.007593 0.633962 0.99743 1.0 -0.315789 0.067669
7 Madrid 208 480 0.829943 0.093491 0.049705 0.703776 (40.30359236, -3.811841853, 40.56010038, -3.447367855) 0.364474 0.256508 v(40.426940676971185,-3.67461855901923) 0.0 0.0 0.676329 0.998412 1.0 -0.644231 -0.163462
8 Mexico 147 328 0.877274 0.053625 0.038794 0.823691 (19.324427, -99.215899, 19.534595, -98.960745) 0.255154 0.210168 v(19.426182782312928,-99.12700180952378) 0.000278 0.004529 0.534247 0.998675 1.0 -0.605442 -0.088435
9 Moscow 134 312 0.89252 0.134583 0.095862 0.799659 (55.569722, 37.407778, 55.897778, 37.818023) 0.410245 0.328056 v(55.73970225373134,37.61470240298508) 0.00042 0.002095 0.458647 0.998057 0.999944 -0.059701 0.19403
10 NewYork 433 950 0.747255 0.090182 0.061338 0.844298 (40.576178, -74.031361, 40.903, -73.755426) 0.275935 0.326822 v(40.72945972979215,-73.93355020785226) 0.0 0.002191 0.608796 0.999123 1.0 -0.538106 -0.136259
11 Osaka 108 246 0.81138 0.036782 0.024961 0.884782 (34.556028, 135.4122, 34.75992, 135.5926) 0.1804 0.203892 v(34.67056508333334,135.51295197222223) 0.0 0.002163 0.514019 0.997507 0.999583 0.222222 0.296296
12 Paris 299 712 0.846795 0.038396 0.028243 0.719643 (2.228331, 48.77977215, 2.459317446, 48.946) 0.166228 0.230986 v(2.3437453335351166,48.860113404983295) 0.000001 0.002712 0.57047 0.998945 0.999999 -0.070234 0.043478
13 Seoul 392 874 0.798977 0.922816 0.538437 0.663896 (36.769628, 126.616708, 37.948611, 127.39943) 0.782722 1.178983 v(37.503326494897976,126.97862405612248) 0.000415 0.003272 0.785166 0.999378 1.0 -0.030612 0.790816
14 Shanghai 148 158 0.75015 0.152405 0.091711 0.917791 (31.0025, 121.224, 31.41, 121.598) 0.374 0.4075 v(31.228932081081084,121.46187795945949) 0.0 0.005099 0.605442 0.997096 1.0 0.628378 0.310811
15 Tokyo 217 262 0.861586 0.070981 0.04305 0.592517 (35.586859, 139.612885, 35.791938, 139.959) 0.346115 0.205079 v(35.69779647004608,139.74785092165905) 0.0 0.0 0.597222 0.997213 1.0 -0.62212 0.152074
<Figure size 100x100 with 0 Axes>

We can see, that Moscow has the most symmetric subway layout (according to the visual symmetry value of 0.89252), while Hong Kong has the most irregular layout (with a visual symmetry value of just 0.556229) . The London underground covers by far the most area, which more than 15 times larger than the smallest subway, Barcelona - according to the size of the tight area, which calculates area using the convex hull.

Edge- and nodewise parameters#

Some properties are not global, but instead defined for each individual node, edge or crossing. We will illustrate how to analyse such local features. To do so, we are going to look at the distribution of edge lengths, which tells us how long the distance between stations is.

We begin by drawing some violin plots for the edge length distribution.

[6]:
data = []
labels = []
for name, g in subways:
    pos = gdMetriX.get_node_positions(g)
    edge_lengths = [gdMetriX.euclidean_distance(pos[edge[0]], pos[edge[1]]) for edge in g.edges()]
    data.append(edge_lengths)
    labels.append(name)

data = sorted(data, key=lambda x: max(x), reverse=True)

fig, ax = plt.subplots()
fig.set_figheight(4)
fig.set_figwidth(16)
ax.set_xticks(np.arange(1, len(labels) + 1), labels=labels)
ax.violinplot(data)

plt.show()
../_images/notebooks_tutorial_12_0.png

We can see that the distance between stations vary hugely. Let us plot the average edge length on a heatmap to spot densely populated areas using the heatmap function.

[7]:
plt.figure(figsize=(20, 10))
i = 0
for name, g in subways:
    ax = plt.subplot(3, 5, i + 1)
    ax.set_title(name)

    pos = gdMetriX.get_node_positions(g)
    edge_lengths = [gdMetriX.euclidean_distance(pos[edge[0]], pos[edge[1]]) for edge in g.edges()]
    edge_pos = [gdMetriX.Vector.from_point(pos[edge[0]]).mid(gdMetriX.Vector.from_point(pos[edge[1]])) for edge in g.edges()]

    x_values = [x for x, y in pos.values()]
    y_values = [y for x, y in pos.values()]

    heatmap = gdMetriX.heatmap(g, edge_pos, edge_lengths, 20)
    ax.imshow(heatmap, cmap='viridis', interpolation='nearest')

    i+=1


plt.show()
../_images/notebooks_tutorial_14_0.png

For some cities, the average edge length seems to increase as one goes further away from the centre, however, for the most part, the edge lengths seem to be roughly evenly distributed. Maybe stations are still close along each subway line, but the distance between lines increases. Maybe we can detect a pattern when looking at the node density instead.

Instead of building the average of a value as for the edge length above, we simply want to count the number of vertices in each grid cell of the heatmap. To do so, we simply set average=False and supply None as the list of values.

[8]:
plt.figure(figsize=(20, 10))
i = 0
for name, g in subways:
    ax = plt.subplot(3, 5, i + 1)
    ax.set_title(name)

    pos = [point for point in gdMetriX.get_node_positions(g).values()]
    heatmap = gdMetriX.heatmap(g, pos, None, 20, average=False)
    ax.imshow(heatmap, cmap='viridis', interpolation='nearest')

    i+=1

plt.show()
../_images/notebooks_tutorial_16_0.png

Seems that our hypothesis was correct. This time, the densely populated city centers are easily discernible.