Data Science for Economists
2026-03-01
Spatial Concepts & Applications
Why are inventors concentrated in certain locations?
\[y_{ig} = \gamma x_{ig} + \beta m_y(y_g) + \delta m_x(x_{ig}) + \epsilon_{ig}\]
The average outcome for the group is an aggregation of outcomes or behaviours over other group members, i.e. aggregation of individual characteristics over other group members
\(\rightarrow\) Multicollinearity
How to measure concentration of patents across regions in a country?
\[\text{Conc} = \sum_{g=1}^{n} |s_g - s|\]
where \(s_g\) is the number of patents per capita in region \(g\) with \(g = \{1, \dots, n\}\), while \(s\) is the number per capita in the whole economy
Rank people by income, instead of regions by number of patents
It is equivalent to the relative mean absolute difference
\[G = \frac{\sum_{g=1}^{n}\sum_{j=1}^{n} |x_g - x_j|}{2n^2 \bar{x}}\]
where \(x_g\) is the number of patents in region \(g\).
Key idea (Rey & Smith 2013): decompose inequality into a neighbor component and a non-neighbor component. If inequality among neighbors is lower than among non-neighbors → positive spatial autocorrelation (similar regions cluster).
With spatial interconnection matrix \(G\), a general spatial model includes:
| Model | What it includes |
|---|---|
| SAR | Only endogenous spatial lag (\(Gy\)) |
| SLX | Only contextual effects (\(GX\)) |
| SDM | Both endogenous + contextual |
| SEM | Spatial structure in errors only |
Core issue: endogenous, contextual, and correlated effects cannot be separately identified from the reduced form — OLS confounds all three.
Solutions:
Application: Chinese aid allocation