Data Science for Economists
2026-03-01
By the end of today you should be able to:
ellmer package) for classification, extraction, and annotation tasks.Motivation
Economic upside
Better text representations often improve predictive power (e.g. central-bank tone \(\to\) yields) and enable causal designs that exploit semantic shifts (e.g. narrative shocks).
From counts to dense vectors
Fixes on top of BoW (topic models, n-grams, dependency parsing) help but remain brittle, high-dimensional, and context-insensitive.
“You shall know a word by the company it keeps.” – J. R. Firth
Key idea
Move beyond which words occur to where words live in a low-dimensional space. Build a co-occurrence matrix and factorise it so that similar words sit close together.

Other approaches
LSA (PCA on DTM), pLSA (probabilistic LSA), NMF (non-negative matrix factorisation) all share the same goal: reduce dimensionality from \(V\) (vocab size) to \(K\) (topic count). LDA adds Dirichlet priors and is the most widely adopted.
Transformers and Attention
Masked word prediction
Given “As a leading firm in the [MASK] sector, we hire highly skilled …”
| Model | Context | Provider | Notes |
|---|---|---|---|
| GPT-4.1 | 1M tokens | OpenAI | Current flagship |
| Claude Opus 4.6 / Sonnet 4.6 | 200k tokens | Anthropic | Strong on reasoning + code |
| Llama 4 | 128k–10M tokens | Meta (open) | Open weights, many sizes |
| DeepSeek V3 / R1 | 128k tokens | DeepSeek (open) | Reasoning at low cost |
| Gemini 2.5 | 1M+ tokens | Multimodal |
Key trend: open-weight models are closing the gap; costs collapsed 60x since 2023. Check current benchmarks — this table goes stale fast.
The ellmer package
But watch out
API outputs are stochastic (set temperature = 0 for near-deterministic results), models update without notice, and costs can surprise at scale.
ellmer package: tidyverse-native LLM interface# install.packages("ellmer")
library(ellmer)
# Connect to any provider via OpenRouter, OpenAI, Anthropic, etc.
chat <- chat_openai(
model = "gpt-4.1-mini",
system_prompt = "You are a helpful economics research assistant."
)
# Simple text query
chat$chat("Summarize the main argument of Acemoglu et al. 2001 in two sentences.")ellmer supports OpenAI, Anthropic, OpenRouter (access to 100+ models), Ollama (local models), and more – all with the same interface.
ellmerlibrary(ellmer)
library(purrr)
classify_sentiment <- function(text) {
chat <- chat_openai(model = "gpt-4.1-mini")
chat$chat(paste0(
"Classify the sentiment of this central bank statement as ",
"'hawkish', 'dovish', or 'neutral'. ",
"Return ONLY the label.\n\nText: ", text
))
}
# Apply to a data frame of speeches
speeches$sentiment <- map_chr(speeches$text, classify_sentiment)Cost estimate
10,000 short paragraphs with gpt-4.1-mini \(\approx\) $0.30. With DeepSeek R1 via OpenRouter \(\approx\) $0.05.
JSON schemas for reproducible measurement
ellmer: sentiment + entitieslibrary(ellmer)
# Define the output type
type_analysis <- type_object(
sentiment = type_enum("hawkish", "dovish", "neutral",
.description = "Overall monetary policy sentiment"),
confidence = type_number(.description = "Confidence score 0-1"),
key_entities = type_array(items = type_string(),
.description = "Named entities mentioned (central banks, countries, etc.)"),
summary = type_string(.description = "One-sentence summary")
)
chat <- chat_openai(model = "gpt-4.1-mini")
result <- chat$extract_data(
"The ECB kept rates unchanged but signaled that inflation risks remain tilted
to the upside, suggesting further tightening may be needed in Q3.",
type = type_analysis
)This is a native R list – no parsing needed. Bind rows across documents to get a tidy data frame.
type_trade_event <- type_object(
event_type = type_enum("tariff", "sanction", "quota", "subsidy", "other"),
countries = type_array(items = type_string()),
products = type_array(items = type_string()),
direction = type_enum("restrictive", "liberalizing", "neutral"),
date_mentioned = type_string(.description = "Date if mentioned, else 'NA'")
)
# Process 5,000 news articles
results <- map(articles$text, \(txt) {
chat <- chat_openai(model = "gpt-4.1-mini")
chat$extract_data(txt, type = type_trade_event)
})
trade_events <- bind_rows(results)This replaces weeks of manual coding with hours of API calls and pennies of cost.
Designing prompts that measure what you mean
system_prompt <- "
You classify FOMC statements by monetary policy stance.
Examples:
- 'The Committee decided to raise the target range for the federal funds
rate to 5 to 5-1/4 percent.' -> hawkish
- 'The Committee decided to lower the target range by 50 basis points.'
-> dovish
- 'The Committee decided to maintain the target range.' -> neutral
Classify the following statement. Return ONLY the label.
"
chat <- chat_openai(model = "gpt-4.1", system_prompt = system_prompt)
chat$chat(new_statement)system_prompt <- "
You are an expert trade policy analyst. For each news article:
1. Identify whether a trade policy event is described.
2. If yes, determine the type (tariff, sanction, quota, subsidy, other).
3. Identify affected countries and products.
4. Assess whether the measure is restrictive or liberalizing.
Think step-by-step before providing your final answer.
Return your analysis as JSON.
"Chain-of-thought prompting improves accuracy on multi-step reasoning tasks by 10–30% (Wei et al. 2022). The model “shows its work” before committing to an answer.
Reproducibility
Model versions change. Always log the exact model ID, prompt text, temperature, and date of API calls. Pin model versions where possible (e.g. gpt-4.1-2024-08-06).
Four measurement problems
text-embedding-3-large (OpenAI) or sentence-transformers for 768+ dim vectors.LLM-generated variables introduce unique challenges for causal inference:
gpt-4 in 2024 \(\neq\) gpt-4.1 in 2026. Pin model versions and log everything.For a thorough treatment, see Ash & Hansen (2023), Text Algorithms in Economics, AER.
Application: AI-Generated Production Networks
Hallucination, Reproducibility, Cost
gpt-4 in January 2024 \(\neq\) gpt-4 in January 2025.temperature = 0, outputs may vary slightly across API calls.gpt-4.1-2024-08-06).| Date | Model | Cost per 1M input tokens |
|---|---|---|
| Mar 2023 | GPT-4 | ~$30.00 |
| Nov 2023 | GPT-4 Turbo | ~$10.00 |
| May 2024 | GPT-4o | ~$2.50 |
| Jul 2024 | GPT-4o-mini | ~$0.15 |
| Jan 2025 | DeepSeek R1 | ~$0.55 |
| Apr 2025 | GPT-4.1-mini | ~$0.10 |
| Feb 2026 | Frontier models | ~$0.10–$2.00 |
Rule of thumb
Use LLMs for annotation and measurement (replacing human coders), not as a substitute for econometric identification.
RAG: Retrieval-Augmented Generation
Problem: LLMs hallucinate facts and can’t access your private data.
Solution: Retrieval-Augmented Generation — retrieve relevant source documents, then pass them to the LLM alongside your question.
Why this matters for economists:
Always validate LLM labels against human coders. Cohen’s \(\kappa\) measures inter-rater agreement beyond chance:
library(irr)
# human_labels and llm_labels are character/factor vectors of same length
human_labels <- c("hawkish", "dovish", "neutral", "hawkish", "dovish")
llm_labels <- c("hawkish", "dovish", "hawkish", "hawkish", "dovish")
# Compute Cohen's kappa
kappa_result <- kappa2(cbind(human_labels, llm_labels))
kappa_result
#> Cohen's Kappa for 2 Raters
#> Kappa = 0.615
#> z = 1.94, p-value = 0.052| \(\kappa\) | Interpretation |
|---|---|
| < 0.20 | Poor |
| 0.21–0.40 | Fair |
| 0.41–0.60 | Moderate |
| 0.61–0.80 | Substantial |
| > 0.80 | Almost perfect |
Understanding LLMs by building one
torchTraining = adjust weights so the model gets better at predicting the next character in ECB speeches.
Input: "The ECB has decided to maintain intere"
→ Character embedding (128 dimensions)
→ 2 Transformer blocks (4 attention heads each)
→ Linear projection → softmax
→ Predict next character: "s" (for "interest")
# Prerequisites (install once):
# install.packages("torch")
# torch::install_torch()
# Run the mini-LLM training script:
source("code/05-mini_llm.R")
# After training, generate text:
generate(model, start_str = "The ECB has decided",
max_tokens = 200, temperature = 0.8)
#> "The ECB has decided to maintain interest rates at their
#> present levels. The current monetary policy stance remains
#> accommodative and the inflation expectations..."See code/05-mini_llm.R for the full implementation.
Recap & Further Directions
ellmer, OpenRouter) make classification, extraction, and annotation almost turn-key.ellmer – tidyverse-native LLM interface for R.text2vec, quanteda – traditional NLP in R.spacyr, udpipe – linguistic annotation.