11. OCR: Digitized Data
Data Science for Economists
Julian Hinz
2026-03-01
Today’s plan
“Non-computable information”
Lloyd’s shipping list:
The Wind of Change: Maritime Technology, Trade, and Economic Development
, Pascali (2017)
Plantation records:
The Development Effects of the Extractive Colonial Economy: The Dutch Cultivation System in Java
, Dell and Olken (2020)
Clay tablets:
Trade, Merchants, and the Lost Cities of the Bronze Age
, Barjamovic et al. (2019)
Non-computable information
Non-computable information
Standard digitization methods often fail to capture historical documents effectively
Especially for less frequently used languages, scripts and settings
Data may also be trapped in various types of images
Text data contains a significant amount of non-computable information
Economics and data
Key economic questions necessitate disaggregated data: Misallocation, inequality, social mobility, welfare effects of trade
Long-term digital disaggregated data uncommon
Existing data predominantly originating from high resource contexts
Growing academic interest, also due to much better computing power and methods
Digitizing data