Data Science for Economists
2026-03-01
Non-computable information
Source: Melissa Dell — “OCR and Record Linkage”, April 2023
Source: Melissa Dell — “OCR and Record Linkage”, April 2023
Source: Melissa Dell — “OCR and Record Linkage”, April 2023
library(tesseract)
# Basic OCR on an image
text <- ocr("path/to/scanned_page.png")
cat(text)
# With language specification
eng <- tesseract("eng")
text <- ocr("path/to/document.png", engine = eng)
# Get word-level bounding boxes (useful for structured extraction)
words <- ocr_data("path/to/document.png")
head(words) # columns: word, confidence, bboxWind of Change — Lloyd’s list
Dutch Colonies
Gravity with Clay Tablets