Quickstart
In this page, we will quickly work through the Julia code necessary to load a network dataset, assign weights to it, and identify and score missing links. I recommend running this code in an interactive, notebook style environment; I use Quarto and Visual Studio Code. More details about all of the functions from the MissingLinks package that are used herein are available in the API Documentation.
Load packages
First, we need to load some packages. The Missing Links code is distributed as a Julia package, so we need to load MissingLinks. We also need to load the GeoDataFrames package for spatial data manipulation, the Plots library for visualization, the Graphs library for working with graph data structures, the StatsBase library for statistics, and the Mmap package for working with memory-mapped files. Most of these packages are available from the Julia general registry, and can be installed with the Pkg-mode add command. Mmap is built into Julia and does not need to be installed. MissingLinks, however, is not in the general registry; to install it you should enter Pkg-mode (press ]
) and then type add https://github.com/mattwigway/MissingLinks.jl
. In a nutshell, you will run this code, after entering package mode by pressing ]
. You'll run this code in the REPL (in VSCode, choose View -> Command Palette and search for "Julia: Start REPL"). All of the other code you'll run should be in your notebook interface so you can save it. While it is optional, I do recommend creating a Julia environment to keep MissingLinks code separate from any other Julia projects you may work on.
add GeoDataFrames Plots Graphs StatsBase
add https://github.com/mattwigway/MissingLinks.jl
Press backspace to exit package mode.
Once packages are installed, we are ready to load them.
using MissingLinks, GeoDataFrames, Plots, Mmap, Graphs, StatsBase
If you're using Quarto and VSCode, place this code between triple-backticks to create a Julia cell, then hover over it and click "run cell":
```{julia}
using MissingLinks, GeoDataFrames, Plots, Mmap, Graphs, StatsBase
```
Loading data
Next, we will read our data. To keep things fast for this example, we are working with a tiny segment of the Charlotte pedestrian network, in the Northlake area. We need data on the pedestrian network, and origins and destinations. In this case we are using residential parcels as origins, and a selection of commercial destinations in the Northlake area digitized from OpenStreetMap as destinations. The data are included with the MissingLinks package; running
MissingLinks.get_example_data()
"example_data"
will create a folder example_data
in the current working directory, containing the data.
All of our layers are already projected to EPSG:32119 (State Plane North Carolina, meters). I recommend using a meter-based coordinate system. The system will not work correctly with a geographic coordinate system, and has not been tested with feet.
sidewalks = GeoDataFrames.read("example_data/Northlake.gpkg")
parcels = GeoDataFrames.read("example_data/Northlake_Parcels.gpkg")
destinations = GeoDataFrames.read("example_data/Northlake_Destinations.gpkg")
Row | geometry |
---|---|
IGeometr… | |
1 | Geometry: wkbPoint |
2 | Geometry: wkbPoint |
3 | Geometry: wkbPoint |
4 | Geometry: wkbPoint |
5 | Geometry: wkbPoint |
6 | Geometry: wkbPoint |
7 | Geometry: wkbPoint |
8 | Geometry: wkbPoint |
9 | Geometry: wkbPoint |
10 | Geometry: wkbPoint |
11 | Geometry: wkbPoint |
12 | Geometry: wkbPoint |
13 | Geometry: wkbPoint |
14 | Geometry: wkbPoint |
15 | Geometry: wkbPoint |
We can plot what we read in:
# sidewalks have an elevation component that we need to remove to plot
sidewalks.geom = MissingLinks.remove_elevation!.(sidewalks.geom)
plot(parcels.geom, color="#0f0", legend=false, aspect_ratio=:equal)
plot!(sidewalks.geom, color="#f00")
plot!(destinations.geometry, color="#f00")
Building the network
Next, we need to convert our network GIS layer into a mathematical "graph"—a data structure of nodes and edges representing intersections and streets. This network dataset is already "fully noded"—i.e. all intersections occur at the ends of line segments. The tool can also work with "semi-noded" data—where all intersections are at the end of one line segment, but may be in the middle of another. To use that kind of data, run the preprocessing function semi_to_fully_noded
before building the graph.
To build the graph, we will use the graph_from_gdal
function, which will convert GeoDataFrames into a graph.
graph = graph_from_gdal(sidewalks)
Meta graph based on a Graphs.SimpleGraphs.SimpleGraph{Int64} with vertex labels of type MissingLinks.VertexID, vertex metadata of type Tuple{Float64, Float64}, edge metadata of type @NamedTuple{length_m::Float64, link_type::Union{Missing, String}, geom::ArchGDAL.IGeometry{ArchGDAL.wkbLineString}}, graph metadata given by nothing, and default weight Inf
It is possible to specify more than one layer (for example, if you have sidewalks and crosswalks in separate layers) by separating the layers with commas. It is also possible to set a tolerance for when nodes are considered coincident. However, I recommend leaving this at the default small value and using add_short_edges!
to close such gaps, to avoid issues that are described in the linked documentation. Similarly, remove_tiny_islands
can be used to remove small "islands" in the graph that may be data errors. There are also tools for troubleshooting the graph, namely find_dead_ends
to find locations that are dead ends, and find_disconnected_crossings
to find places where sidewalks cross without intersecting. While both of these things will be present in a correct graph, they may also represent data errors.
Assigning weights to nodes
Next, we need to assign weights to the nodes. We will create two sets of weights, for origins and destinations. For origins, the weights are just the number of units on the parcel. The weight of each parcel will get divided among the graph edges within 20 meters, and then the weight for each edge will be evenly divided between the nodes it connects. [:,1]
retrieves the first column of the weights (if multiple weight columns are specified instead of just units, there will be one column in the output for each column specified. This avoids re-doing computationally-intensive geographic calculations if there are more weight columns—for instance, if both our origins and our destinations were coded in the parcel layer, or if we wanted to measure accessibility from multiple types of origin).
origin_weights = create_graph_weights(graph, parcels, [:units], 20)[:,1]
1696-element Vector{Float64}:
1.966666666666667
2.4333333333333336
0.6833333333333332
1.4916666666666667
3.225
3.5833333333333335
1.7833333333333337
1.75
2.341666666666667
4.058333333333334
⋮
0.0
0.0
2.999625468164794
3.0329588014981272
3.0329588014981272
0.0
1.8928571428571426
0.0
0.0
We can look at what proportion of the units are served by sidewalks and therefore were assigned to the network. Some units will not be assigned to any edge and therefore will not count towards origin_weights.
sum(origin_weights) / sum(parcels.units)
0.8947839046199701
We can similarly create our destination weights. There is no weight column here—all of the destinations are equally weighted. However, we need a weight column so we just create a column of all ones. Since destinations are further from the sidewalk network due to parking lots, we use a 100 meter distance threshold. In the paper we use parcels and a 20 meter threshold, on the assumption that the edges of parcels will be close to the road, but here we have point data on destinations.
destinations.one .= 1
destination_weights = create_graph_weights(graph, destinations, [:one], 100)[:, 1]
sum(destination_weights) / sum(destinations.one)
0.9333333333333333
Creating the distance matrix
A point-to-point distance matrix is used extensively throughout the algorithm for identify and scoring links. We store distances in this matrix as UInt16 integer meters, to save memory (each UInt16 can represent values from 0-65535, and takes only two bytes of memory, as opposed to four or eight for traditional integers and longs). For a network as small as the one we're working with here, it would be possible to store this distance matrix in memory. However, in real-world applications, often the matrix will be larger than available memory, so storing it on disk and using memory-mapping of the disk file is recommended. Memory-mapping lets the computer treat a portion of the disk as if it were memory, and the CPU and the operating system work to make sure that the data are available quickly when needed.
matrix_file = open("quickstart_distances.bin", "w+")
matrix = Mmap.mmap(matrix_file, Matrix{UInt16}, (nv(graph), nv(graph)))
# this just makes sure the matrix is filled with 0s before we begin. This should not make any real difference as all values should be overwritten but is just a good practice.
fill!(matrix, 0)
fill_distance_matrix!(graph, matrix)
1696-element Vector{Vector{UInt16}}:
[0x0000, 0x0037, 0x0022, 0x0059, 0x00a7, 0x00db, 0x0072, 0x00ea, 0x006c, 0x0099 … 0x080c, 0x0714, 0x0742, 0x0b0c, 0x0afc, 0x0ae6, 0x0271, 0x01a9, 0x0336, 0x050a]
[0x0037, 0x0000, 0x0059, 0x0022, 0x0070, 0x00a5, 0x003b, 0x00b4, 0x0036, 0x0063 … 0x07d5, 0x06dd, 0x070b, 0x0ad5, 0x0ac6, 0x0aaf, 0x023a, 0x0172, 0x02ff, 0x04d3]
[0x0022, 0x0059, 0x0000, 0x007b, 0x00c9, 0x00fd, 0x0094, 0x010d, 0x008e, 0x00bc … 0x082e, 0x0736, 0x0764, 0x0b2e, 0x0b1e, 0x0b08, 0x0293, 0x01cb, 0x0358, 0x052c]
[0x0059, 0x0022, 0x007b, 0x0000, 0x004e, 0x0083, 0x0019, 0x0092, 0x0014, 0x0041 … 0x07b3, 0x06bb, 0x06e9, 0x0ab4, 0x0aa4, 0x0a8d, 0x0218, 0x0151, 0x02de, 0x04b1]
[0x00a7, 0x0070, 0x00c9, 0x004e, 0x0000, 0x0035, 0x0035, 0x0044, 0x003a, 0x0067 … 0x07da, 0x06e2, 0x0710, 0x0ada, 0x0aca, 0x0ab4, 0x023e, 0x0177, 0x0304, 0x04d8]
[0x00db, 0x00a5, 0x00fd, 0x0083, 0x0035, 0x0000, 0x0069, 0x000f, 0x006f, 0x009c … 0x080f, 0x0716, 0x0744, 0x0b0f, 0x0aff, 0x0ae8, 0x0273, 0x01ac, 0x0339, 0x050c]
[0x0072, 0x003b, 0x0094, 0x0019, 0x0035, 0x0069, 0x0000, 0x0079, 0x0006, 0x0033 … 0x07a5, 0x06ad, 0x06db, 0x0aa5, 0x0a96, 0x0a7f, 0x020a, 0x0142, 0x02cf, 0x04a3]
[0x00ea, 0x00b4, 0x010d, 0x0092, 0x0044, 0x000f, 0x0079, 0x0000, 0x007e, 0x00ab … 0x081e, 0x0725, 0x0754, 0x0b1e, 0x0b0e, 0x0af7, 0x0282, 0x01bb, 0x0348, 0x051b]
[0x006c, 0x0036, 0x008e, 0x0014, 0x003a, 0x006f, 0x0006, 0x007e, 0x0000, 0x002d … 0x07a0, 0x06a7, 0x06d5, 0x0aa0, 0x0a90, 0x0a79, 0x0204, 0x013d, 0x02ca, 0x049d]
[0x0099, 0x0063, 0x00bc, 0x0041, 0x0067, 0x009c, 0x0033, 0x00ab, 0x002d, 0x0000 … 0x0772, 0x067a, 0x06a8, 0x0a73, 0x0a63, 0x0a4c, 0x01d7, 0x0110, 0x029d, 0x0470]
⋮
[0x0714, 0x06dd, 0x0736, 0x06bb, 0x06e2, 0x0716, 0x06ad, 0x0725, 0x06a7, 0x067a … 0x02f4, 0x0000, 0x00a7, 0x05f5, 0x05e5, 0x05ce, 0x04a3, 0x056b, 0x03de, 0x08cb]
[0x0742, 0x070b, 0x0764, 0x06e9, 0x0710, 0x0744, 0x06db, 0x0754, 0x06d5, 0x06a8 … 0x0322, 0x00a7, 0x0000, 0x0623, 0x0613, 0x05fc, 0x04d1, 0x0599, 0x040c, 0x08f9]
[0x0b0c, 0x0ad5, 0x0b2e, 0x0ab4, 0x0ada, 0x0b0f, 0x0aa5, 0x0b1e, 0x0aa0, 0x0a73 … 0x0300, 0x05f5, 0x0623, 0x0000, 0x0010, 0x0027, 0x089c, 0x0963, 0x07d6, 0x0cc4]
[0x0afc, 0x0ac6, 0x0b1e, 0x0aa4, 0x0aca, 0x0aff, 0x0a96, 0x0b0e, 0x0a90, 0x0a63 … 0x02f0, 0x05e5, 0x0613, 0x0010, 0x0000, 0x0017, 0x088c, 0x0953, 0x07c6, 0x0cb4]
[0x0ae6, 0x0aaf, 0x0b08, 0x0a8d, 0x0ab4, 0x0ae8, 0x0a7f, 0x0af7, 0x0a79, 0x0a4c … 0x02da, 0x05ce, 0x05fc, 0x0027, 0x0017, 0x0000, 0x0875, 0x093d, 0x07af, 0x0c9d]
[0x0271, 0x023a, 0x0293, 0x0218, 0x023e, 0x0273, 0x020a, 0x0282, 0x0204, 0x01d7 … 0x059b, 0x04a3, 0x04d1, 0x089c, 0x088c, 0x0875, 0x0000, 0x00c7, 0x00c6, 0x0428]
[0x01a9, 0x0172, 0x01cb, 0x0151, 0x0177, 0x01ac, 0x0142, 0x01bb, 0x013d, 0x0110 … 0x0663, 0x056b, 0x0599, 0x0963, 0x0953, 0x093d, 0x00c7, 0x0000, 0x018d, 0x0361]
[0x0336, 0x02ff, 0x0358, 0x02de, 0x0304, 0x0339, 0x02cf, 0x0348, 0x02ca, 0x029d … 0x04d6, 0x03de, 0x040c, 0x07d6, 0x07c6, 0x07af, 0x00c6, 0x018d, 0x0000, 0x04ee]
[0x050a, 0x04d3, 0x052c, 0x04b1, 0x04d8, 0x050c, 0x04a3, 0x051b, 0x049d, 0x0470 … 0x09c3, 0x08cb, 0x08f9, 0x0cc4, 0x0cb4, 0x0c9d, 0x0428, 0x0361, 0x04ee, 0x0000]
Link identification
We are now ready to identify missing links. The identify_potential_missing_links
will do this and return a vector of places in the network that meet the specified criteria (in this case, no longer than 100 meters, and with a network distance of 1000 meters or more; you can change these values in the code below).
all_links = identify_potential_missing_links(graph, matrix, 100, 1000)
896-element Vector{MissingLinks.CandidateLink{UInt16}}:
77m CandidateLink connecting edge 30-32@48m to 704-705@0m; network distance 65535m
72m CandidateLink connecting edge 32-685@28m to 704-705@0m; network distance 65535m
68m CandidateLink connecting edge 66-67@0m to 868-1538@34m; network distance 1074m
27m CandidateLink connecting edge 66-67@0m to 865-1539@0m; network distance 1129m
27m CandidateLink connecting edge 66-67@0m to 864-865@0m; network distance 1129m
27m CandidateLink connecting edge 66-67@0m to 863-864@30m; network distance 1135m
47m CandidateLink connecting edge 66-67@0m to 1538-1539@24m; network distance 1098m
67m CandidateLink connecting edge 66-68@36m to 866-867@0m; network distance 1003m
66m CandidateLink connecting edge 66-68@36m to 866-868@0m; network distance 1003m
37m CandidateLink connecting edge 66-68@36m to 868-1538@34m; network distance 1037m
⋮
96m CandidateLink connecting edge 1645-1647@149m to 702-703@2m; network distance 1224m
87m CandidateLink connecting edge 1645-1647@149m to 703-716@37m; network distance 1187m
91m CandidateLink connecting edge 1645-1647@63m to 715-716@0m; network distance 1042m
88m CandidateLink connecting edge 1645-1647@50m to 714-715@23m; network distance 1008m
87m CandidateLink connecting edge 1654-1655@106m to 418-419@30m; network distance 1076m
92m CandidateLink connecting edge 1654-1655@108m to 420-421@31m; network distance 1075m
87m CandidateLink connecting edge 1654-1655@106m to 419-421@0m; network distance 1076m
98m CandidateLink connecting edge 1655-1657@0m to 418-419@30m; network distance 1120m
98m CandidateLink connecting edge 1655-1657@0m to 419-421@0m; network distance 1120m
We can plot the links identified (shown here in blue):
all_links_geo = links_to_gis(graph, all_links)
plot(sidewalks.geom, color="#000", legend=false, aspect_ratio=:equal)
plot!(all_links_geo.geometry, color="#4B9CD3")
Link deduplication
Clearly, most of these links are essentially duplicates. The next step is to deduplicate them, by merging links where both ends are within 100m of one another (configurable by changing the number below).
links = deduplicate_links(all_links, matrix, 100)
22-element Vector{MissingLinks.CandidateLink{UInt16}}:
72m CandidateLink connecting edge 32-685@28m to 704-705@0m; network distance 65535m
26m CandidateLink connecting edge 66-68@12m to 865-1539@3m; network distance 1113m
45m CandidateLink connecting edge 114-115@34m to 1586-1587@173m; network distance 1116m
43m CandidateLink connecting edge 114-115@34m to 1590-1591@138m; network distance 1081m
98m CandidateLink connecting edge 196-198@43m to 1124-1138@12m; network distance 1348m
86m CandidateLink connecting edge 266-268@19m to 1124-1138@12m; network distance 1266m
86m CandidateLink connecting edge 267-269@81m to 1142-1143@12m; network distance 1479m
98m CandidateLink connecting edge 295-297@49m to 1139-1141@77m; network distance 1532m
23m CandidateLink connecting edge 354-359@6m to 881-1039@119m; network distance 1862m
20m CandidateLink connecting edge 876-882@12m to 1155-1163@11m; network distance 2008m
⋮
20m CandidateLink connecting edge 447-1166@92m to 1158-1159@29m; network distance 2475m
20m CandidateLink connecting edge 487-488@69m to 1366-1367@25m; network distance 65535m
7m CandidateLink connecting edge 487-488@60m to 489-490@67m; network distance 65535m
28m CandidateLink connecting edge 676-677@75m to 704-705@0m; network distance 65535m
91m CandidateLink connecting edge 629-686@58m to 704-705@0m; network distance 65535m
26m CandidateLink connecting edge 667-669@7m to 700-701@35m; network distance 65535m
80m CandidateLink connecting edge 674-675@56m to 704-705@0m; network distance 65535m
85m CandidateLink connecting edge 703-716@16m to 1641-1647@126m; network distance 1231m
88m CandidateLink connecting edge 714-715@23m to 1645-1647@50m; network distance 1008m
We can now plot these deduplicated links:
links_geo = links_to_gis(graph, links)
plot(sidewalks.geom, color="#000", legend=false, aspect_ratio=:equal)
# set line width wider to make the links easier to see
plot!(links_geo.geometry, color="#4B9CD3", lw=2)
Link scoring
The final step is to score the identified links, using the score_links
function. For this we need to specify what our origin and destination weights are, as well as our distance decay function. Here, I use a 1-mile (1609 meter) cutoff. This will return a vector with the score for each link. We have to specify both the distance decay function (first argument) and the point at which that function goes to zero (last argument); for a smooth decay function, I recommend instead specifying a piecewise function that goes to zero when the result is small enough that additional destinations do not materially affect accessibility.
link_scores = score_links(d -> d < 1609, links, matrix, origin_weights, destination_weights, 1609)
22-element Vector{Float64}:
0.0
0.0
1.727099085103875
1.727099085103875
767.2176519158897
741.2873119298721
537.5906372737354
573.9455530960228
4248.055325817284
3545.6235159582443
⋮
1982.501313235136
953.7111111111101
951.6982204893737
0.0
0.0
0.0
0.0
0.0
0.2789844733029542
Saving links to GIS
Generally, you will want to export your results to GIS for further analysis, including the scores. You can create a GeoDataFrame including the scores like this:
links_geo = links_to_gis(graph, links, :score=>link_scores)
Row | fr_edge_src | fr_edge_tgt | fr_dist_from_start | fr_dist_to_end | to_edge_src | to_edge_tgt | to_dist_from_start | to_dist_to_end | geographic_length_m | network_length_m | geometry | score |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Int64 | Int64 | UInt16 | UInt16 | Int64 | Int64 | UInt16 | UInt16 | UInt16 | Float64 | IGeometr… | Float64 | |
1 | 32 | 685 | 28 | 12 | 704 | 705 | 0 | 38 | 72 | Inf | Geometry: wkbLineString | 0.0 |
2 | 66 | 68 | 12 | 24 | 865 | 1539 | 3 | 27 | 26 | 1113.0 | Geometry: wkbLineString | 0.0 |
3 | 114 | 115 | 34 | 0 | 1586 | 1587 | 173 | 0 | 45 | 1116.0 | Geometry: wkbLineString | 1.7271 |
4 | 114 | 115 | 34 | 0 | 1590 | 1591 | 138 | 0 | 43 | 1081.0 | Geometry: wkbLineString | 1.7271 |
5 | 196 | 198 | 43 | 39 | 1124 | 1138 | 12 | 0 | 98 | 1348.0 | Geometry: wkbLineString | 767.218 |
6 | 266 | 268 | 19 | 77 | 1124 | 1138 | 12 | 0 | 86 | 1266.0 | Geometry: wkbLineString | 741.287 |
7 | 267 | 269 | 81 | 71 | 1142 | 1143 | 12 | 13 | 86 | 1479.0 | Geometry: wkbLineString | 537.591 |
8 | 295 | 297 | 49 | 53 | 1139 | 1141 | 77 | 0 | 98 | 1532.0 | Geometry: wkbLineString | 573.946 |
9 | 354 | 359 | 6 | 0 | 881 | 1039 | 119 | 2 | 23 | 1862.0 | Geometry: wkbLineString | 4248.06 |
10 | 876 | 882 | 12 | 12 | 1155 | 1163 | 11 | 10 | 20 | 2008.0 | Geometry: wkbLineString | 3545.62 |
11 | 386 | 388 | 0 | 89 | 881 | 1039 | 118 | 3 | 94 | 1939.0 | Geometry: wkbLineString | 2715.64 |
12 | 418 | 419 | 30 | 0 | 1654 | 1655 | 106 | 44 | 87 | 1076.0 | Geometry: wkbLineString | 46.8859 |
13 | 447 | 1166 | 55 | 41 | 1159 | 1167 | 35 | 0 | 19 | 2548.0 | Geometry: wkbLineString | 1696.19 |
14 | 447 | 1166 | 92 | 4 | 1158 | 1159 | 29 | 1 | 20 | 2475.0 | Geometry: wkbLineString | 1982.5 |
15 | 487 | 488 | 69 | 41 | 1366 | 1367 | 25 | 41 | 20 | Inf | Geometry: wkbLineString | 953.711 |
16 | 487 | 488 | 60 | 50 | 489 | 490 | 67 | 59 | 7 | Inf | Geometry: wkbLineString | 951.698 |
17 | 676 | 677 | 75 | 11 | 704 | 705 | 0 | 38 | 28 | Inf | Geometry: wkbLineString | 0.0 |
18 | 629 | 686 | 58 | 0 | 704 | 705 | 0 | 38 | 91 | Inf | Geometry: wkbLineString | 0.0 |
19 | 667 | 669 | 7 | 17 | 700 | 701 | 35 | 0 | 26 | Inf | Geometry: wkbLineString | 0.0 |
20 | 674 | 675 | 56 | 0 | 704 | 705 | 0 | 38 | 80 | Inf | Geometry: wkbLineString | 0.0 |
21 | 703 | 716 | 16 | 79 | 1641 | 1647 | 126 | 23 | 85 | 1231.0 | Geometry: wkbLineString | 0.0 |
22 | 714 | 715 | 23 | 22 | 1645 | 1647 | 50 | 99 | 88 | 1008.0 | Geometry: wkbLineString | 0.278984 |
You can then write this to GIS file like this:
GeoDataFrames.write("links.gpkg", links_geo)
We can also use it in further plotting. For example, this plot shows the top five links (the competerank
function just ranks values largest to smallest).
plot(sidewalks.geom, color="#000", legend=false, aspect_ratio=:equal)
# set line width wider to make the links easier to see
plot!(links_geo[competerank(links_geo.score) .<= 5, :geometry], color="#4B9CD3", lw=2)
As we might expect, the new links that connect the completely disconnected neighborhood on the west side are most valuable. Those that shorten trips in already connected areas are less valuable.