05. Networks

Data Science for Economists

Irene Iodice

2026-03-01

Why should I care?

  • Big picture: networks turn isolated data points into a map of economic interdependence.

  • Key insight: centrality and paths reveal who really drives trade, prices and growth.

  • Today’s skill: load a real network in R and compute its core stats.

Roadmap for today

  1. Concepts & data
  2. Metrics: walks, paths, diameter
  3. Centrality: degree, closeness, betweenness
  4. Null-model baseline
  5. Application: Economic Complexity

Motivation

Disruptions in the Automotive Industry

Source: Deutsche Bank

Chip shortage & global supply-chains

Browse more products here: https://atlas.cid.harvard.edu/

Smartphone GVC before vs after 2017

2009–12

2017–20

How has the network thinned?

Characteristics of the Supply Chain network
[2009:2012] [2013:2016] [2017:2020]
# of different countries 21 16 15
# of different Buyers 18 12 11
# of different Sellers 13 10 9
Number of supply links 224 154 130

Concepts & Representation

What is a network?

  1. A set of nodes (vertices)
  1. A set of edges (links) connecting pairs of nodes

Trade illustration

  • Nodes = countries
  • Edge \(i\rightarrow j\) if country \(i\) exports to \(j\)
  • Collect edges in an adjacency matrix \(A\) with \(A_{ij}=1\) when the flow exists

Creating a network in R

library(igraph)

# From an adjacency matrix
A <- matrix(c(0,1,1, 1,0,0, 1,0,0), nrow = 3,
            dimnames = list(c("USA","CHN","DEU"), c("USA","CHN","DEU")))
g <- graph_from_adjacency_matrix(A, mode = "undirected")

# From an edge list
edges <- data.frame(from = c("USA","USA","CHN"), to = c("CHN","DEU","DEU"))
g2 <- graph_from_data_frame(edges, directed = FALSE)

# Inspect
vcount(g)   # 3 nodes
ecount(g)   # 2 edges

From simple to richer graph structures

Level 0 – Simple

  • Undirected, unweighted
  • No self-loops (\(A_{ii}=0\))
  • Captures trade relationship (bilateral)

. . .

Level 1 – Directed

  • Order matters: \(A_{ij}\neq A_{ji}\)
  • Captures export flows

. . .

Level 2 – Weighted

  • Edge values = intensity (volume, tariff)
  • Can even be negative (cost/friction)

Three representations of a toy trade network (USA, CHN, DEU): undirected/unweighted → directed → directed and weighted.

Example of Directed Graph

library(igraph)
node_list <- tibble(id = 1:4)
edge_list <- tibble(from = c(1, 2, 2, 3, 4), to = c(2, 3, 4, 2, 1))
directed_g <- graph_from_data_frame(d = edge_list,
                   vertices = node_list, directed = TRUE)
get.adjacency(directed_g)
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#>   1 2 3 4
#> 1 . 1 . .
#> 2 . . 1 1
#> 3 . 1 . .
#> 4 1 . . .

Example of Directed Graph (plot)

plot(directed_g, edge.arrow.size = 0.2)

Other types of graphs

Complete Graph

Star

Tree

Bipartite Network

Practical Corner: Bipartite Network

# generate a dataframe to represent all the edges of your bipartite network
d <- data.frame(
  country  = c("DEU", "DEU", "FRA", "FRA", "CAN", "CAN", "USA"),
  trade_agr = c("CETA", "EU", "EU", "CETA", "CETA", "USMCA", "USMCA")
)
# transform it into a graph
g <- graph_from_data_frame(d, directed = FALSE)
# define color and shape mappings to distinguish node types
V(g)$label <- V(g)$name
V(g)$type  <- 1
V(g)[name %in% d$trade_agr]$type <- 2
col   <- c("steelblue", "orange")
shape <- c("circle", "square")
plot(g,
     vertex.color = col[V(g)$type],
     vertex.shape = shape[V(g)$type])

Metrics

Walks, paths & geodesics

  1. Walk – Any ordered sequence of edges: CHL \(\rightarrow\) BRA \(\rightarrow\) DEU \(\rightarrow\) BRA
  1. Path – A walk with no repeated node: CHL \(\rightarrow\) BRA \(\rightarrow\) DEU
  1. Length – Number of edges in the path from \(i\) to \(j\) (above: 2)
  1. Geodesic – Shortest path between two nodes; its length is the graph distance, denoted \(\ell(i,j)\). For CHL–DEU the geodesic is CHL \(\rightarrow\) BRA \(\rightarrow\) DEU.

Economic reading – The geodesic length tells you how many trade “hops” a Chilean export shock needs to reach Germany.

Seeing the metrics (toy trade network)

Walk (can revisit nodes) | Path (no repeated nodes) | Geodesic (shortest path, distance = 2)

walk | path | geodesic

Test corner

How many paths from 3 to 1? Which is shortest?

Stats for graphs

igraph::all_simple_paths(directed_g, 3, 1)
#> [[1]]
#> + 3/4 vertices, named, from 2c34291:
#> [1] 3 2 1
#>
#> [[2]]
#> + 4/4 vertices, named, from 2c34291:
#> [1] 3 2 4 1

igraph::shortest_paths(directed_g, 3, 1)
#> $vpath
#> $vpath[[1]]
#> + 3/4 vertices, named, from 2c34291:
#> [1] 3 2 1

How connected is a trade network?

  1. Density (no self-loop) \[\delta = \frac{2m}{n(n-1)}\] where \(m\) = # edges, \(n\) = # nodes. [Guess for the trade network between countries?]
  1. Giant component size – Fraction of nodes in the largest connected piece. Ex. 94% of countries belong to one export web.
  1. Diameter \(\displaystyle \max_{i,j} \ell(i,j)\). “Farthest two countries need 6 hops.”
  1. Average path length \(\displaystyle \bar{\ell} = \frac{2}{n(n-1)} \sum_{i>j}\ell(i,j)\). Real trade: \(\bar{\ell}=3.1\)

Reading – Low diameter + high giant-component share imply shocks can spread globally; low density curbs redundancy.

Connectedness on our toy graph

Pentagon graph: CHL–USA–BRA–DEU–CHL, BRA–CHN. Five nodes, five edges.

Metric Value
Nodes \(n\) 5
Edges \(m\) 5
Density \(\delta\) 0.50
Giant component share 100%
Diameter 3
Avg. path length \(\bar{\ell}\) 1.9

All five countries sit in one component; any shock crosses the network in 3 hops or fewer.

Key Network Structures in Economics

Network Type Structure / Intuition Density Avg. Path Economic Context
Star (hub-and-spoke) One central hub connected to all others Low Very short Logistics, supply chains, platform economies
Core–periphery Dense central group + sparse outer nodes Medium Short to moderate Global trade hierarchy: developed vs. emerging
Modular (community) Dense internal clusters with few inter-cluster links Medium Moderate Regional trade blocs, innovation clusters
Scale-free Hubs dominate; many nodes with few links Low Very short Financial contagion, tech networks
Bipartite (countries–products) Two node types (e.g., exporters and goods) Structured Varies Economic complexity, RCA-based trade analysis

Different network shapes reflect different economic dynamics – efficiency, fragility, inequality, or specialization.

Application 1: Buyer–Supplier Network

Bernard et al. (2018) – Firm-to-firm trade between US and Norwegian firms

(a) HS 847990 – One Product

Sparse, modular network with clear firm clusters. Suggests specialized, non-overlapping supply chains. Shocks likely stay localized unless a hub is affected.

(b) All Products

Dense, tangled core with many interconnections. Reflects scale-free or core–periphery structure. Efficient but more exposed to contagion via central firms.

Application 2: Environmental cooperation agreements network

Carattini et al. (2022): Countries as nodes and edges represent whether there is an environmental agreement between that country pair.

Application 2: Environmental cooperation (cont.)

Reference: Carattini et al. (2022)

Centrality

From “how far?” to “who matters?”

  1. We have learned to measure distance
    • walks, paths, geodesics tell us how trade shocks travel
  1. Next question: which nodes shape those shocks most?
    • Does a hub with many partners matter more than a broker on the only East–West route?
  1. Centrality measures
    • Degree: direct reach
    • Closeness: speed of access
    • Betweenness: brokerage power
    • Eigenvector: inherited prestige

Take-away – centrality turns raw topology into economic influence.

Degree centrality (direct partners)

Pentagon graph with BRA highlighted (degree 3 — widest direct reach).

Country \(k_i\)
CHL 2
USA 2
BRA 3
DEU 2
CHN 1

BRA links to three partners – widest direct reach.

Local vs. global reach

  1. Degree centrality (direct reach) \[k_i = \sum_{j=1}^{n} A_{ij}\]
    • Notation: \(A_{ij}=1\) if country \(j\) exports to \(i\) (otherwise 0).
    • Concept: “How many direct partners do I trade with?”
    • Ex. China has \(k_{\text{CHN}}\approx 140\): wide export option set dampens a single-partner shock.
  1. Closeness centrality (inverse average distance) \[C_i = \frac{n-1}{\displaystyle\sum_{j\neq i} d(i,j)}\]
    • Notation: \(d(i,j)\) is the length of the shortest trade path (number of hops) from \(i\) to \(j\).
    • Concept: “How quickly can I reach every market?”
    • Ex. NLD is 1–2 hops away from most EU economies: rapid shock propagation.

Closeness centrality (avg. trade hops to everyone)

Pentagon graph with BRA highlighted (closeness 0.80 — never farther than 2 hops).

Country \(C_i\)
CHL 0.57
BRA 0.80
USA 0.67
DEU 0.57
CHN 0.50

BRA never farther than 2 hops – quickest access to all.

Betweenness & Eigenvector centrality

Betweenness centrality (broker power)

\[B_i = \sum_{\substack{s\neq i\neq t \\ s<t}} \frac{\sigma_{st}(i)}{\sigma_{st}}\]

  • \(\sigma_{st}\) = number of shortest paths from \(s\) to \(t\); \(\sigma_{st}(i)\) = those through \(i\).
  • “What share of trade routes rely on me as a bridge?”
  • Ex. PNM lies on many Asia–Atlantic routes: high \(B_i\), a chokepoint for global shipping.
Country \(B_i\)
CHL 0.5
BRA 1.0
USA 4.0
DEU 1.0
CHN 0

USA lies on 4 of 5 shortest cross-region routes – choke-point.

Eigenvector centrality (inherited influence)

\[e_i = \frac{1}{\lambda}\sum_{j=1}^{n} A_{ij}\,e_j\]

  • \(e\) is the leading right-eigenvector of \(A\); \(\lambda\) its eigenvalue.
  • “A partner counts more if they are central.”
  • Ex. Singapore trades heavily with USA, CHN; gains prestige from their importance.
Country \(e_i\)
CHL 0.22
BRA 0.28
USA 0.30
DEU 0.25
CHN 0.35

CHN gains status from trading with high-score USA & BRA.

Computing centrality in R

library(igraph)
degree(directed_g)             # degree centrality
closeness(directed_g)          # closeness centrality
betweenness(directed_g)        # betweenness centrality
eigen_centrality(directed_g)$vector  # eigenvector centrality

All four measures are one-liners in igraph. The intuition matters more than the derivation.

Null Models

Degree Distributions

Complete graph on 6 nodes — every node connected to every other (degree 5).

Complete graph – All nodes have degree 5

Histogram: single bar at degree 5 with height 6 — uniform, no heterogeneity.

Degree distribution – Uniform, no heterogeneity

Real trade networks are rarely this uniform: most have few links, a few have many.

Why Use Null Models?

  • Real networks mix basic constraints (size, activity) and meaningful structure (hubs, communities).
  • Null models = random benchmarks matching basic constraints (e.g., node degrees).
  • Compare real network metrics (like path length) to the null: Is the observed structure surprising?
  • Example: Is avg. path length 3.1 in trade shorter/longer than expected by chance, given country degrees?

Key Null Models

Two common benchmarks:

  1. Erdos–Renyi \(G(n,p)\) (Simplest)
    • Randomly connects nodes with fixed probability \(p\).
    • Preserves: Avg. density. Ignores node specifics.
    • Use: Baseline for “purely random” connections.
  1. Configuration Model (CM)
    • Randomly connects nodes while keeping each node’s exact degree the same as observed.
    • Preserves: Degree sequence \((k_1, \dots, k_n)\).
    • Use: Tests if structure (clustering, path length) differs from random mixing given node degrees.

Interpreting Deviations from Configuration Model

Compare Real Network Metric vs. CM Metric (e.g., Avg. Path Length \(\bar{\ell}\))

Purpose: Does structure beyond individual node degrees matter?

Common Interpretations for \(\bar{\ell}\):

  • Real \(\bar{\ell}\) < CM \(\bar{\ell}\): Network is more efficiently connected than expected by chance (given degrees). \(\rightarrow\) Suggests specific organizing principles (e.g., hubs).

  • Real \(\bar{\ell}\) > CM \(\bar{\ell}\): Network is more fragmented / distant than expected by chance (given degrees). \(\rightarrow\) Suggests barriers or clustering.

\(\implies\) Null models help isolate the impact of non-random network topology.

Which null model to use?

Model Best suited question igraph call
Erdos–Renyi \(G(n,p)\) “Is the network denser than random?” sample_gnp(n, p)
Configuration model “Is structure surprising given node degrees?” sample_degseq()

Other approaches: edge-rewiring (rewire(..., keeping_degseq())), gravity-constrained null models (common in trade economics — simulate from gravity model, threshold to get links).

Applications

“The productivity of a country resides in the diversity of its available non-tradable capabilities, and therefore, cross-country differences in income can be explained by differences in economic complexity, as measured by the diversity of capabilities present in a country and their interactions.”

– Hidalgo and Hausmann 2009

Matrix of diversification of countries

Source: Cristelli, Tacchella, Pietronero (2014)

The theory of hidden capabilities

A country is able to produce a product when it has the capabilities to do it (Hausmann & Hidalgo 2009)

Source: Hidalgo et al. (2009)

Network structure & RCA

Let us index countries with \(c=1,\dots,n\) and products with \(p\).

The bipartite network is represented by a biadjacency matrix \(\mathbf{B}\) of size \(n \times p\):

\[ B_{cp}=\begin{cases} 1, & \text{if country } c \text{ is a significant exporter of product } p \\ 0, & \text{otherwise} \end{cases} \]

Significant exporter when:

\[ RCA_{cp}= \frac{\frac{q_{cp}}{\sum_{p} q_{cp}}}{\frac{\sum_{c}q_{cp}}{\sum_{c}\sum_{p}q_{cp}}} > 1 \]

i.e., whenever the share of product \(p\) in the country export basket is larger than its share in world trade.

Method of Reflections: Intuition

Key idea: iterate between diversification (how many products does a country export?) and ubiquity (how many countries export a product?).

  • Start: \(k_{c,0}\) = number of products exported (diversification), \(k_{p,0}\) = number of exporters (ubiquity)
  • Iterate: a country’s complexity rises if it exports products that few others can make
  • Converges to the Economic Complexity Index (ECI)

Countries in the Product Space

Source: Hidalgo et al. (2007)

Projecting the Bipartite Network

Let \(\mathbf{B}\) be the \(n \times p\) biadjacency matrix where \(B_{cp} = 1\) if country \(c\) significantly exports product \(p\).

Step: Project onto products

Define a product–product relatedness matrix \(\mathbf{M}\):

\[M_{pp'} = \sum_c B_{cp} \cdot B_{cp'}\]

This gives the number of countries that export both products \(p\) and \(p'\). The matrix \(\mathbf{M}\) is symmetric and captures the co-export intensity between products.

Weighted version: Normalise by product ubiquity:

\[\phi_{pp'} = \frac{\sum_c B_{cp} \cdot B_{cp'}}{\max(k_{p,0},\, k_{p',0})}\]

where \(k_{p,0} = \sum_c B_{cp}\) is the number of exporters of product \(p\).

Interpretation: Two products are close in the product space if many countries export them both.

The Product Space of Trade

Source: Hidalgo et al. (2007)

Modern R Tools for Networks

tidygraph + ggraph: Tidy Network Analysis

tidygraph wraps igraph in a tidy interface; ggraph provides ggplot2-style network plots:

library(tidygraph)
library(ggraph)

tg <- as_tbl_graph(directed_g) |>
  mutate(centrality = centrality_degree())

ggraph(tg, layout = "stress") +
  geom_edge_link(arrow = arrow(length = unit(3, "mm")),
                 end_cap = circle(3, "mm")) +
  geom_node_point(aes(size = centrality), color = "steelblue") +
  geom_node_text(aes(label = name), repel = TRUE) +
  theme_graph()
  • Same igraph engine underneath, but dplyr-style verbs for node/edge manipulation
  • ggraph produces publication-quality network plots

Community Detection

Identify clusters of densely connected nodes:

library(igraph)

# Louvain algorithm (fast, widely used)
g <- sample_gnp(100, 0.05)
communities <- cluster_louvain(g)
membership(communities)      # which community each node belongs to
modularity(communities)      # quality of the partition (0-1)

# Leiden algorithm (improved Louvain, avoids poorly-connected communities)
communities_leiden <- cluster_leiden(g, resolution = 1)

Economic application: community detection on trade networks reveals regional blocs, production clusters, or supply chain modules.

Sources

  • Jackson, Matthew O. Social and Economic Networks. Vol. 3. Princeton University Press, 2008.

Suggested references: