06 — Text as Data
Text mining, sentiment analysis, and dictionary methods.

This module covers text analysis: preprocessing, tokenization, tf-idf, dictionary methods, and applications including the Economic Policy Uncertainty Index.
Lecture slides
Code
Check the course repository for the R scripts.
Key packages: quanteda, tidytext, stringr
Further resources
- Gentzkow, Kelly, and Taddy (2019): Text as Data (JEL)
- Baker, Bloom, and Davis (2016): Measuring Economic Policy Uncertainty (QJE)
- Julia Silge and David Robinson’s Text Mining with R (free online book)
- quanteda tutorials