\(y_{ig}\) is the number of patents of inventor \(i\) geolocalized in area \(g\)
\(x_{ig}\) are individual characteristics (age)
\(m_y, m_x\) are aggregations of variables that are spatially connected with location \(g\)
e.g. avg. age and avg. number of patents in the same location
The reflection problem
The average outcome for the group is an aggregation of outcomes or behaviours over other group members, i.e. aggregation of individual characteristics over other group members
\(\rightarrow\)Multicollinearity
Content of spatial data
Position data in 2D (or 3D) (location of inventors)
sometimes entities: polygons
Attribute data (number of patents)
Metadata related to the position data (characteristics of location)
Units of geographical space
Quick start: reading and plotting spatial data
library(sf)library(ggplot2)library(rnaturalearth)# Load country boundariesworld <-ne_countries(scale ="medium", returnclass ="sf")# Plot a simple mapggplot(world) +geom_sf(fill ="lightgray", color ="white") +theme_minimal() +labs(title ="World countries (Natural Earth)")
Measuring spatial concentration
How to measure concentration of patents across regions in a country?
Krugman specialization/concentration index
\[\text{Conc} = \sum_{g=1}^{n} |s_g - s|\]
where \(s_g\) is the number of patents per capita in region \(g\) with \(g = \{1, \dots, n\}\), while \(s\) is the number per capita in the whole economy
Spatial Gini Index
Gini Index
Rank people by income, instead of regions by number of patents
It is equivalent to the relative mean absolute difference
where \(x_g\) is the number of patents in region \(g\).
Spatial decomposition of the Gini coefficient
Key idea (Rey & Smith 2013): decompose inequality into a neighbor component and a non-neighbor component. If inequality among neighbors is lower than among non-neighbors → positive spatial autocorrelation (similar regions cluster).
Non-randomness in spatial data
Complete Random Allocation in 2D
Incomplete Random Allocation in 2D
The economics of spatial non-randomness
Random allocation, characteristics of location vary
Farmers randomly allocated, but crops depend on soil etc. (Holmes and Lee, 2012)
Non-random allocation, location characteristics no causal effect
R&D in Silicon Valley (Ellison and Glaeser, 1997)
Random allocation, interactions matter
College dormitory allocation and peer effects in choice of majors (Sacerdote, 2001)
Non-random allocation, interactions matter
Childhood neighborhood effects (Gibbons, 2013)
Spatial models: the key challenge
With spatial interconnection matrix \(G\), a general spatial model includes:
Long-run lights-GDP relationship: correlation coefficient of 0.53
Long-run lights-GDP elasticity of 0.28 to 0.32
No evidence of non-linearity or asymmetry between increases and decreases
Structural elasticity of lights growth with respect to GDP growth: between 1.0 and 1.7
Many nightlight papers
Hodler and Raschky (2014): Regional favoritism — stronger growth in origin region of ruler
Bluhm and Krause (2022): Top lights — top-coding leads to underestimating growth of African cities
Lee (2016): International isolation and regional inequality — nightlights to estimate economic activity in North Korea, whose government produces no credible economic statistics
Visible spectrum applications
von Carnap (2022): remotely-sensed market activity as short-run economic indicator in rural developing areas
Engstrom et al. (2022): combining nightlight data with visible spectrum imagery to predict poverty
Building detection
Car counting
von Carnap (2022): market activity from space
Remotely-sensed market activity as short-run economic indicator in rural developing areas
Concept
Results
von Carnap (2022): validation
Practical R Code with sf and terra
Data models: vector and raster
Vector data — points, lines, polygons (the sf package)
Raster data — regular grids of values (the terra package)
sf — simple features
Low-level libraries for geocomputation:
GDAL: reading, writing, and manipulating geographic data formats