07. Spatial & Satellite Data

Data Science for Economists

2026-03-01

Spatial Concepts & Applications

Patents’ inventors location

Stek (2020)

Patterns in spatial data

Why are inventors concentrated in certain locations?

  1. Common influences
    • Quality of education, infrastructures, etc.
    • Economic activity, jobs, etc.
  1. Spillover effects
    • Social interactions
    • Information sharing
    • Skill hubs

Analysis of spatial data

  • Can you identify the two effects?
  • Idea: Estimate the following model via OLS

\[y_{ig} = \gamma x_{ig} + \beta m_y(y_g) + \delta m_x(x_{ig}) + \epsilon_{ig}\]

  • \(y_{ig}\) is the number of patents of inventor \(i\) geolocalized in area \(g\)
  • \(x_{ig}\) are individual characteristics (age)
  • \(m_y, m_x\) are aggregations of variables that are spatially connected with location \(g\)
    • e.g. avg. age and avg. number of patents in the same location

The reflection problem

The average outcome for the group is an aggregation of outcomes or behaviours over other group members, i.e. aggregation of individual characteristics over other group members

\(\rightarrow\) Multicollinearity

Content of spatial data

  • Position data in 2D (or 3D) (location of inventors)
    • sometimes entities: polygons
  • Attribute data (number of patents)
  • Metadata related to the position data (characteristics of location)

Units of geographical space

Quick start: reading and plotting spatial data

library(sf)
library(ggplot2)
library(rnaturalearth)

# Load country boundaries
world <- ne_countries(scale = "medium", returnclass = "sf")

# Plot a simple map
ggplot(world) +
  geom_sf(fill = "lightgray", color = "white") +
  theme_minimal() +
  labs(title = "World countries (Natural Earth)")

Measuring spatial concentration

How to measure concentration of patents across regions in a country?

  • Krugman specialization/concentration index

\[\text{Conc} = \sum_{g=1}^{n} |s_g - s|\]

where \(s_g\) is the number of patents per capita in region \(g\) with \(g = \{1, \dots, n\}\), while \(s\) is the number per capita in the whole economy

  • Spatial Gini Index

Gini Index

Rank people by income, instead of regions by number of patents

It is equivalent to the relative mean absolute difference

\[G = \frac{\sum_{g=1}^{n}\sum_{j=1}^{n} |x_g - x_j|}{2n^2 \bar{x}}\]

where \(x_g\) is the number of patents in region \(g\).

Spatial decomposition of the Gini coefficient

Key idea (Rey & Smith 2013): decompose inequality into a neighbor component and a non-neighbor component. If inequality among neighbors is lower than among non-neighbors → positive spatial autocorrelation (similar regions cluster).

Non-randomness in spatial data

Complete Random Allocation in 2D

Incomplete Random Allocation in 2D

The economics of spatial non-randomness

  1. Random allocation, characteristics of location vary
    • Farmers randomly allocated, but crops depend on soil etc. (Holmes and Lee, 2012)
  2. Non-random allocation, location characteristics no causal effect
    • R&D in Silicon Valley (Ellison and Glaeser, 1997)
  3. Random allocation, interactions matter
    • College dormitory allocation and peer effects in choice of majors (Sacerdote, 2001)
  4. Non-random allocation, interactions matter
    • Childhood neighborhood effects (Gibbons, 2013)

Spatial models: the key challenge

With spatial interconnection matrix \(G\), a general spatial model includes:

  • Endogenous effects (\(Gy\)): neighbors’ outcomes affect yours
  • Contextual effects (\(GX\)): neighbors’ characteristics affect yours
  • Correlated effects (\(Gv\)): shared unobservables
Model What it includes
SAR Only endogenous spatial lag (\(Gy\))
SLX Only contextual effects (\(GX\))
SDM Both endogenous + contextual
SEM Spatial structure in errors only

The reflection problem (Manski, 1993)

Core issue: endogenous, contextual, and correlated effects cannot be separately identified from the reduced form — OLS confounds all three.

Solutions:

  • Non-linear functional forms (Brock & Durlauf 2001)
  • Exclusion restrictions (assume away some channels)
  • Incomplete interactions (\(GG \neq G\))
  • Spatial differencing (Holmes 1998): subtract spatial means to address correlated shocks

Application: Chinese aid allocation

Dreher et al. (2019): Chinese foreign aid

Map of Chinese aid value

Map of leaders’ birthplaces

Empirical strategy

Do current political leaders’ birthplaces matter for the allocation of Chinese aid?

\[\text{Aid}_{ict} = \alpha + \gamma\, \text{Birthregion}_{ict} + \epsilon_{ict}\]

where \(\text{Birthregion}_{ict} = 1\) if the political leader of country \(c\) in year \(t\) was born in administrative region \(i\).

Problems? They apply spatial differencing:

\[\text{Aid}_{ict} = \color{red}{\alpha_{ct} + \delta_{ic}} + \sum_j \beta_j X^j_{ic} + \gamma\, \text{Birthregion}_{ict} + \epsilon_{ict}\]

Results: birth region effects

Satellite Imagery & Remote Sensing

A brief history of satellite imagery

  • 1946: First image from space — sub-orbital V-2 rocket flight, October 24
  • 1959: First satellite image — U.S. Explorer 6, August 14
  • Primary use-case: Spying
  • Now: ~1,800+ (known) Earth observation satellites in orbit (UCS Satellite Database, 2025)
  • Public: Landsat (30m since early 1980s), ESA Sentinel, NASA MODIS (36 spectral bands since 2000)
  • Private: GeoEye, Maxar, Planet — up to 0.31m spatial resolution

Satellite image examples: livestock monitoring

Argentina

Deforestation tracking: Chiribiquete, Colombia

2017

2018

Deforestation tracking: Chiribiquete, Colombia (cont.)

2020

2021

COVID lockdowns: New Delhi air quality

April 2019 (before)

April 2020 (lockdown)

Conflict monitoring: Berdiansk, Ukraine

March 22, 2022

March 24, 2022

Use of satellite data in economic analysis

  • Night lights, pollution, …
  • Precipitation, wind speed, flooding, topography, …
  • Forest cover, crop choice, agricultural productivity, fish abundance, …
  • Urban development, building type, roads, beach quality, …

Advantages: availability and objectivity

  • Burgess et al. (2012): political economy of deforestation
    • Official administrative data may be tainted because of bribing / incentive for misreporting
  • Jayachandran (2009): impact of air pollution on infant and fetal mortality
    • 1997 Indonesian forest fire caused 16,400 infant and fetal deaths

Advantages: high resolution

  • Publicly available satellite imagery: 30m grid cells or better
  • Private data: better than 0.5m
  • Marx, Stoker, and Suri (2015): reflectivity as proxy for dwelling investments in a Nairobi slum
    • Role of ethnic favoritism in residential markets
  • Others: count cars in parking lots, measure automobile traffic, count crowds at political rallies, …

Advantages: global coverage

  • Henderson, Storeygard, and Weil (2012): nighttime lights as a proxy for settlement patterns and wealth at high spatial resolution, globally
  • Dingel, Miscio and Davis (2021): for developing countries, lights-based metropolitan populations follow a power law, while administrative units do not

Technicalities: orbits and sensors (overview)

  • Geostationary (~36,000 km): continuous but lower resolution; Sun-synchronous (<6,000 km): high-res, consistent lighting
  • Sensors capture different electromagnetic bands (visible, infrared, microwave) → vegetation, temperature, moisture, nightlights
  • Raw imagery requires processing: cloud removal, atmospheric correction, orthorectification

Coordinate Reference Systems

Projections go back and forth between:

  • Ellipsoidal coordinates — degrees latitude and longitude — pointing to locations on a shape approximating the Earth (an ellipsoid)
  • Projected coordinates — flat, two-dimensional coordinate system used when plotting maps

Explore projections: geo-projections.com

CRS overview

Alaska in 4 projections

Nightlights and GDP

Henderson, Storeygard, and Weil (2012)

“Measuring Economic Growth from Outer Space”

  • Long-run lights-GDP relationship: correlation coefficient of 0.53
  • Long-run lights-GDP elasticity of 0.28 to 0.32
    • No evidence of non-linearity or asymmetry between increases and decreases
  • Structural elasticity of lights growth with respect to GDP growth: between 1.0 and 1.7

Many nightlight papers

  • Hodler and Raschky (2014): Regional favoritism — stronger growth in origin region of ruler
  • Bluhm and Krause (2022): Top lights — top-coding leads to underestimating growth of African cities
  • Lee (2016): International isolation and regional inequality — nightlights to estimate economic activity in North Korea, whose government produces no credible economic statistics

Visible spectrum applications

  • von Carnap (2022): remotely-sensed market activity as short-run economic indicator in rural developing areas
  • Engstrom et al. (2022): combining nightlight data with visible spectrum imagery to predict poverty

Building detection

Car counting

von Carnap (2022): market activity from space

Remotely-sensed market activity as short-run economic indicator in rural developing areas

Concept

Results

von Carnap (2022): validation

Practical R Code with sf and terra

Data models: vector and raster

Vector data — points, lines, polygons (the sf package)

Raster data — regular grids of values (the terra package)

sf — simple features

Low-level libraries for geocomputation:

  • GDAL: reading, writing, and manipulating geographic data formats
  • PROJ: coordinate system transformations
  • GEOS: planar geometry engine (buffers, centroids, etc.)
  • S2: spherical geometry engine (C++, developed by Google)

Key properties of sf:

  • Objects can be treated as data frames
  • Function names are consistent (all begin with st_)
  • Works well with tidyverse and the |> pipe

sf classes

Working with sf: reading and inspecting

Working with sf: spatial operations

terra for raster data

  • terra is a reboot of the raster package — significantly faster
  • Many interfaces between terra and sf
  • Alternative: stars

Raster: multi-layer

Working with terra: reading and inspecting

Working with terra: operations

Accessing satellite data: the rsi package

The rsi package provides access to STAC catalogs (SpatioTemporal Asset Catalogs) for downloading satellite imagery directly into R.

Mapping with ggplot2 + geom_sf()

Quick end-to-end workflow: Satellite → Regression

Where to get free satellite data

Source Data Resolution Access
USGS EarthExplorer Landsat, MODIS 30m / 250m earthexplorer.usgs.gov
Copernicus Open Access Hub Sentinel-1/2/3/5P 10m–1km dataspace.copernicus.eu
MS Planetary Computer STAC catalog (many sources) Varies planetarycomputer.microsoft.com
Google Earth Engine 900+ datasets Varies earthengine.google.com
VIIRS nightlights Annual/monthly composites 500m eogdata.mines.edu

Further reading

  • Donaldson & Storeygard (2016), “The View from Above: Applications of Satellite Data in Economics”, JEP.
  • World Bank Open Nighttime Lights tutorials: worldbank.github.io/OpenNightLights
  • PLOS ONE (2025), “Shedding Light on Development: Leveraging the New Nightlights Data”