03 — Large Structured Data

Working with large datasets using data.table, DuckDB, and Arrow.

Lecturer
Date

March 1, 2026

A port with ships and containers.

This module covers techniques for handling large structured datasets: firm-level trade data, Pareto distributions, and modern tools like data.table, DuckDB, and Apache Arrow.

Lecture slides

View fullscreen | Source

Code

Check the course repository for the R scripts.

Key packages: data.table, arrow, duckdb, ggplot2

Further resources

data.table

ggplot2

DuckDB & Arrow