01. Getting Started

Data Science for Economists

2026-04-01

What this course is about

  • Data science skills complementary to standard econometrics
  • Data cleaning and wrangling, visualization, databases, machine learning, etc.
  • Research in (broadly defined) International Economics shifts towards empirics
    • I never had this course but wish I did.

WHO WE ARE

name / program / coding background

This Course

  • Weekly sessions over the summer semester (Apr–Jul)
  • Each session: mix of slides and hands-on coding
  • Goal: leave with a working toolkit, not just slides
  • with Irene Iodice and Hendrik Malko

Course overview

Schedule

Week Module
Apr 15 Getting Started – Reproducibility, Git, Docker, IDE setup
Apr 22 Toolkit – Shell basics, R fundamentals, Quarto
Apr 29 Large Structured Data – Millions of rows: data.table, Parquet, duckplyr
May 06 Web Scraping & APIs – HTML parsing, APIs, online prices
May 13 Text as Data – Tokenization, bag-of-words, policy uncertainty
May 20 Spatial & Satellite Data – CRS, nightlights, satellite imagery in R

Schedule (cont.)

Week Module
May 27 TBD
Jun 03 Time as Data – Event studies, diff-in-diff, causal inference
Jun 10 Machine Learning – Model selection, regularization, causal forests
Jun 17 Large Language Models – LLM APIs, structured output, training a mini-LLM
Jun 24 no class
Jul 01 AI-Assisted Research – CLAUDE.md, agents, skills, LLM workflows

Web Scraping

  • “One Billion Prices Project”: Web-scraped prices for many stores and countries (Cavallo & Rigobon, 2016)
  • Are online prices different than offline prices?

Satellite Imagery

Large Structured Data

Arithmetic scale

Log-log scale

Text as Data

Text as Data

Time as Data