Data Science for Economists
2026-03-01
Big picture: networks turn isolated data points into a map of economic interdependence.
Key insight: centrality and paths reveal who really drives trade, prices and growth.
Today’s skill: load a real network in R and compute its core stats.
Motivation
Source: Deutsche Bank
Browse more products here: https://atlas.cid.harvard.edu/


| [2009:2012] | [2013:2016] | [2017:2020] | |
|---|---|---|---|
| # of different countries | 21 | 16 | 15 |
| # of different Buyers | 18 | 12 | 11 |
| # of different Sellers | 13 | 10 | 9 |
| Number of supply links | 224 | 154 | 130 |
Concepts & Representation
Trade illustration
library(igraph)
# From an adjacency matrix
A <- matrix(c(0,1,1, 1,0,0, 1,0,0), nrow = 3,
dimnames = list(c("USA","CHN","DEU"), c("USA","CHN","DEU")))
g <- graph_from_adjacency_matrix(A, mode = "undirected")
# From an edge list
edges <- data.frame(from = c("USA","USA","CHN"), to = c("CHN","DEU","DEU"))
g2 <- graph_from_data_frame(edges, directed = FALSE)
# Inspect
vcount(g) # 3 nodes
ecount(g) # 2 edgesLevel 0 – Simple
. . .
Level 1 – Directed
. . .
Level 2 – Weighted
Three representations of a toy trade network (USA, CHN, DEU): undirected/unweighted → directed → directed and weighted.
library(igraph)
node_list <- tibble(id = 1:4)
edge_list <- tibble(from = c(1, 2, 2, 3, 4), to = c(2, 3, 4, 2, 1))
directed_g <- graph_from_data_frame(d = edge_list,
vertices = node_list, directed = TRUE)
get.adjacency(directed_g)
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#> 1 2 3 4
#> 1 . 1 . .
#> 2 . . 1 1
#> 3 . 1 . .
#> 4 1 . . .Complete Graph

Star

Tree

Bipartite Network

# generate a dataframe to represent all the edges of your bipartite network
d <- data.frame(
country = c("DEU", "DEU", "FRA", "FRA", "CAN", "CAN", "USA"),
trade_agr = c("CETA", "EU", "EU", "CETA", "CETA", "USMCA", "USMCA")
)
# transform it into a graph
g <- graph_from_data_frame(d, directed = FALSE)
# define color and shape mappings to distinguish node types
V(g)$label <- V(g)$name
V(g)$type <- 1
V(g)[name %in% d$trade_agr]$type <- 2
col <- c("steelblue", "orange")
shape <- c("circle", "square")
plot(g,
vertex.color = col[V(g)$type],
vertex.shape = shape[V(g)$type])Metrics
Economic reading – The geodesic length tells you how many trade “hops” a Chilean export shock needs to reach Germany.
Walk (can revisit nodes) | Path (no repeated nodes) | Geodesic (shortest path, distance = 2)
walk | path | geodesic
How many paths from 3 to 1? Which is shortest?
Reading – Low diameter + high giant-component share imply shocks can spread globally; low density curbs redundancy.
Pentagon graph: CHL–USA–BRA–DEU–CHL, BRA–CHN. Five nodes, five edges.
| Metric | Value |
|---|---|
| Nodes \(n\) | 5 |
| Edges \(m\) | 5 |
| Density \(\delta\) | 0.50 |
| Giant component share | 100% |
| Diameter | 3 |
| Avg. path length \(\bar{\ell}\) | 1.9 |
All five countries sit in one component; any shock crosses the network in 3 hops or fewer.
| Network Type | Structure / Intuition | Density | Avg. Path | Economic Context |
|---|---|---|---|---|
| Star (hub-and-spoke) | One central hub connected to all others | Low | Very short | Logistics, supply chains, platform economies |
| Core–periphery | Dense central group + sparse outer nodes | Medium | Short to moderate | Global trade hierarchy: developed vs. emerging |
| Modular (community) | Dense internal clusters with few inter-cluster links | Medium | Moderate | Regional trade blocs, innovation clusters |
| Scale-free | Hubs dominate; many nodes with few links | Low | Very short | Financial contagion, tech networks |
| Bipartite (countries–products) | Two node types (e.g., exporters and goods) | Structured | Varies | Economic complexity, RCA-based trade analysis |
Different network shapes reflect different economic dynamics – efficiency, fragility, inequality, or specialization.
Bernard et al. (2018) – Firm-to-firm trade between US and Norwegian firms
(a) HS 847990 – One Product
Sparse, modular network with clear firm clusters. Suggests specialized, non-overlapping supply chains. Shocks likely stay localized unless a hub is affected.
(b) All Products
Dense, tangled core with many interconnections. Reflects scale-free or core–periphery structure. Efficient but more exposed to contagion via central firms.
Carattini et al. (2022): Countries as nodes and edges represent whether there is an environmental agreement between that country pair.
Reference: Carattini et al. (2022)
Centrality
Take-away – centrality turns raw topology into economic influence.
Pentagon graph with BRA highlighted (degree 3 — widest direct reach).
| Country | \(k_i\) |
|---|---|
| CHL | 2 |
| USA | 2 |
| BRA | 3 |
| DEU | 2 |
| CHN | 1 |
BRA links to three partners – widest direct reach.
Pentagon graph with BRA highlighted (closeness 0.80 — never farther than 2 hops).
| Country | \(C_i\) |
|---|---|
| CHL | 0.57 |
| BRA | 0.80 |
| USA | 0.67 |
| DEU | 0.57 |
| CHN | 0.50 |
BRA never farther than 2 hops – quickest access to all.
Betweenness centrality (broker power)
\[B_i = \sum_{\substack{s\neq i\neq t \\ s<t}} \frac{\sigma_{st}(i)}{\sigma_{st}}\]
| Country | \(B_i\) |
|---|---|
| CHL | 0.5 |
| BRA | 1.0 |
| USA | 4.0 |
| DEU | 1.0 |
| CHN | 0 |
USA lies on 4 of 5 shortest cross-region routes – choke-point.
Eigenvector centrality (inherited influence)
\[e_i = \frac{1}{\lambda}\sum_{j=1}^{n} A_{ij}\,e_j\]
| Country | \(e_i\) |
|---|---|
| CHL | 0.22 |
| BRA | 0.28 |
| USA | 0.30 |
| DEU | 0.25 |
| CHN | 0.35 |
CHN gains status from trading with high-score USA & BRA.
All four measures are one-liners in igraph. The intuition matters more than the derivation.
Null Models
Complete graph on 6 nodes — every node connected to every other (degree 5).
Complete graph – All nodes have degree 5
Histogram: single bar at degree 5 with height 6 — uniform, no heterogeneity.
Degree distribution – Uniform, no heterogeneity
Real trade networks are rarely this uniform: most have few links, a few have many.
Two common benchmarks:
Compare Real Network Metric vs. CM Metric (e.g., Avg. Path Length \(\bar{\ell}\))
Purpose: Does structure beyond individual node degrees matter?
Common Interpretations for \(\bar{\ell}\):
Real \(\bar{\ell}\) < CM \(\bar{\ell}\): Network is more efficiently connected than expected by chance (given degrees). \(\rightarrow\) Suggests specific organizing principles (e.g., hubs).
Real \(\bar{\ell}\) > CM \(\bar{\ell}\): Network is more fragmented / distant than expected by chance (given degrees). \(\rightarrow\) Suggests barriers or clustering.
\(\implies\) Null models help isolate the impact of non-random network topology.
| Model | Best suited question | igraph call |
|---|---|---|
| Erdos–Renyi \(G(n,p)\) | “Is the network denser than random?” | sample_gnp(n, p) |
| Configuration model | “Is structure surprising given node degrees?” | sample_degseq() |
Other approaches: edge-rewiring (rewire(..., keeping_degseq())), gravity-constrained null models (common in trade economics — simulate from gravity model, threshold to get links).
Applications
“The productivity of a country resides in the diversity of its available non-tradable capabilities, and therefore, cross-country differences in income can be explained by differences in economic complexity, as measured by the diversity of capabilities present in a country and their interactions.”
– Hidalgo and Hausmann 2009
Source: Cristelli, Tacchella, Pietronero (2014)
A country is able to produce a product when it has the capabilities to do it (Hausmann & Hidalgo 2009)
Source: Hidalgo et al. (2009)
Let us index countries with \(c=1,\dots,n\) and products with \(p\).
The bipartite network is represented by a biadjacency matrix \(\mathbf{B}\) of size \(n \times p\):
\[ B_{cp}=\begin{cases} 1, & \text{if country } c \text{ is a significant exporter of product } p \\ 0, & \text{otherwise} \end{cases} \]
Significant exporter when:
\[ RCA_{cp}= \frac{\frac{q_{cp}}{\sum_{p} q_{cp}}}{\frac{\sum_{c}q_{cp}}{\sum_{c}\sum_{p}q_{cp}}} > 1 \]
i.e., whenever the share of product \(p\) in the country export basket is larger than its share in world trade.
Key idea: iterate between diversification (how many products does a country export?) and ubiquity (how many countries export a product?).
Source: Hidalgo et al. (2007)
Let \(\mathbf{B}\) be the \(n \times p\) biadjacency matrix where \(B_{cp} = 1\) if country \(c\) significantly exports product \(p\).
Step: Project onto products
Define a product–product relatedness matrix \(\mathbf{M}\):
\[M_{pp'} = \sum_c B_{cp} \cdot B_{cp'}\]
This gives the number of countries that export both products \(p\) and \(p'\). The matrix \(\mathbf{M}\) is symmetric and captures the co-export intensity between products.
Weighted version: Normalise by product ubiquity:
\[\phi_{pp'} = \frac{\sum_c B_{cp} \cdot B_{cp'}}{\max(k_{p,0},\, k_{p',0})}\]
where \(k_{p,0} = \sum_c B_{cp}\) is the number of exporters of product \(p\).
Interpretation: Two products are close in the product space if many countries export them both.
Source: Hidalgo et al. (2007)
Modern R Tools for Networks
tidygraph wraps igraph in a tidy interface; ggraph provides ggplot2-style network plots:
library(tidygraph)
library(ggraph)
tg <- as_tbl_graph(directed_g) |>
mutate(centrality = centrality_degree())
ggraph(tg, layout = "stress") +
geom_edge_link(arrow = arrow(length = unit(3, "mm")),
end_cap = circle(3, "mm")) +
geom_node_point(aes(size = centrality), color = "steelblue") +
geom_node_text(aes(label = name), repel = TRUE) +
theme_graph()ggraph produces publication-quality network plotsIdentify clusters of densely connected nodes:
library(igraph)
# Louvain algorithm (fast, widely used)
g <- sample_gnp(100, 0.05)
communities <- cluster_louvain(g)
membership(communities) # which community each node belongs to
modularity(communities) # quality of the partition (0-1)
# Leiden algorithm (improved Louvain, avoids poorly-connected communities)
communities_leiden <- cluster_leiden(g, resolution = 1)Economic application: community detection on trade networks reveals regional blocs, production clusters, or supply chain modules.
Suggested references: