03 — Large Structured Data
Working with large datasets using data.table, DuckDB, and Arrow.

This module covers techniques for handling large structured datasets: firm-level trade data, Pareto distributions, and modern tools like data.table, DuckDB, and Apache Arrow.
Lecture slides
Code
Check the course repository for the R scripts.
Key packages: data.table, arrow, duckdb, ggplot2
Further resources
data.table
- The official data.table vignette
- Grant McDermott’s data.table slides
- data.table cheat sheet