Data Science for Economists
2026-03-01
Covered in the previous session — remember the 8 building blocks from Gentzkow & Shapiro:
Automation, Version Control, Directories, Keys, Abstraction, Documentation, Management, Code Style
Everything we do today serves these principles.
Shell / Bash
Navigation, Files and Directories
username denotes a specific userhostname denotes name of the computer:~ denotes the directory path (where ~ signifies the user’s home directory)$ denotes the start of the command prompt (# for root)command option(s) argument(s)
pwd to print working directorycd to change directorytouch and mkdirrm-r or -R) and “force” (-f) optionscp object path/copyname (keeps old name if not provided with new one)mv object path/newobjectnameWorking with Text and Pipes
cat (“concatenate”)head and tailgrep (“Global regular expression print”)>>> (> overwrites)|
Loops and Scripting
.sh files with code can be executed#!/bin/sh is a shebang, indicating which program to run the command withRscriptMake
make automates the sequence from raw data → results → paperMakefile# Makefile
paper.pdf: paper.tex figures/plot.png
pdflatex paper.tex
figures/plot.png: output/results.csv code/plot.R
Rscript code/plot.R
output/results.csv: input/data.csv code/analysis.R
Rscript code/analysis.R
make and it figures out what needs rebuildingR Basics
==), matching (%in%)
all.equal()= or <-help(plot) or ?plot#data.table and tibble)[]$$ (continued)$ and the Global Environmentlm()pacman — single-line install + load; good for reproducible teaching setups|>|>|> throughout this course%>% (magrittr pipe) in older code — same idea, but requires library(magrittr) or library(tidyverse)You’ll encounter all three — they’re all rectangular data, with different trade-offs:
data.frame |
tibble |
data.table |
|
|---|---|---|---|
| Package | base R | tidyverse | data.table |
| all rows | fits screen | fits screen | |
| Speed | slow | slow | very fast |
| Syntax | df[row, col] |
dplyr verbs | dt[i, j, by] |
| Best for | small data | tidy pipelines | large data |
tibble is a data.frame with nicer defaultsdata.table modifies in place — crucial for memory on large data.qmd file combines prose, code, and output.Rmd) for new projectsWrap Up