07. Spatial & Satellite Data

Data Science for Economists

2026-03-01

Spatial Concepts & Applications

Patents’ inventors location

Stek (2020)

Patterns in spatial data

Why are inventors concentrated in certain locations?

  1. Common influences
    • Quality of education, infrastructures, etc.
    • Economic activity, jobs, etc.
  1. Spillover effects
    • Social interactions
    • Information sharing
    • Skill hubs

Analysis of spatial data

  • Can you identify the two effects?
  • Idea: Estimate the following model via OLS

\[y_{ig} = \gamma x_{ig} + \beta m_y(y_g) + \delta m_x(x_{ig}) + \epsilon_{ig}\]

  • \(y_{ig}\) is the number of patents of inventor \(i\) geolocalized in area \(g\)
  • \(x_{ig}\) are individual characteristics (age)
  • \(m_y, m_x\) are aggregations of variables that are spatially connected with location \(g\)
    • e.g. avg. age and avg. number of patents in the same location

The reflection problem

The average outcome for the group is an aggregation of outcomes or behaviours over other group members, i.e. aggregation of individual characteristics over other group members

\(\rightarrow\) Multicollinearity

Content of spatial data

  • Position data in 2D (or 3D) (location of inventors)
    • sometimes entities: polygons
  • Attribute data (number of patents)
  • Metadata related to the position data (characteristics of location)

Units of geographical space

Quick start: reading and plotting spatial data

library(sf)
library(ggplot2)
library(rnaturalearth)

# Load country boundaries
world <- ne_countries(scale = "medium", returnclass = "sf")

# Plot a simple map
ggplot(world) +
  geom_sf(fill = "lightgray", color = "white") +
  theme_minimal() +
  labs(title = "World countries (Natural Earth)")

Measuring spatial concentration

How to measure concentration of patents across regions in a country?

  • Krugman specialization/concentration index

\[\text{Conc} = \sum_{g=1}^{n} |s_g - s|\]

where \(s_g\) is the number of patents per capita in region \(g\) with \(g = \{1, \dots, n\}\), while \(s\) is the number per capita in the whole economy

  • Spatial Gini Index

Gini Index

Rank people by income, instead of regions by number of patents

It is equivalent to the relative mean absolute difference

\[G = \frac{\sum_{g=1}^{n}\sum_{j=1}^{n} |x_g - x_j|}{2n^2 \bar{x}}\]

where \(x_g\) is the number of patents in region \(g\).

Spatial decomposition of the Gini coefficient

Key idea (Rey & Smith 2013): decompose inequality into a neighbor component and a non-neighbor component. If inequality among neighbors is lower than among non-neighbors → positive spatial autocorrelation (similar regions cluster).

Non-randomness in spatial data

Complete Random Allocation in 2D

Incomplete Random Allocation in 2D

The economics of spatial non-randomness

  1. Random allocation, characteristics of location vary
    • Farmers randomly allocated, but crops depend on soil etc. (Holmes and Lee, 2012)
  2. Non-random allocation, location characteristics no causal effect
    • R&D in Silicon Valley (Ellison and Glaeser, 1997)
  3. Random allocation, interactions matter
    • College dormitory allocation and peer effects in choice of majors (Sacerdote, 2001)
  4. Non-random allocation, interactions matter
    • Childhood neighborhood effects (Gibbons, 2013)

Spatial models: the key challenge

With spatial interconnection matrix \(G\), a general spatial model includes:

  • Endogenous effects (\(Gy\)): neighbors’ outcomes affect yours
  • Contextual effects (\(GX\)): neighbors’ characteristics affect yours
  • Correlated effects (\(Gv\)): shared unobservables
Model What it includes
SAR Only endogenous spatial lag (\(Gy\))
SLX Only contextual effects (\(GX\))
SDM Both endogenous + contextual
SEM Spatial structure in errors only

The reflection problem (Manski, 1993)

Core issue: endogenous, contextual, and correlated effects cannot be separately identified from the reduced form — OLS confounds all three.

Solutions:

  • Non-linear functional forms (Brock & Durlauf 2001)
  • Exclusion restrictions (assume away some channels)
  • Incomplete interactions (\(GG \neq G\))
  • Spatial differencing (Holmes 1998): subtract spatial means to address correlated shocks

Application: Chinese aid allocation

Dreher et al. (2019): Chinese foreign aid

Map of Chinese aid value