02. Toolkit: R and the Shell

Data Science for Economists

2026-03-01

Session Roadmap

  • AI and your toolkit
  • Shell essentials
  • Make (brief)
  • Git (refresher)
  • R basics
  • Quarto

AI and Your Toolkit

Why learn the toolkit?

AI coding agents are powerful — but they are not magic.

Claude Code and Codex are terminal-based agents that:

  • read and write files (cat, ls, mkdir)
  • run shell commands (Rscript, make, git)
  • edit code, check output, iterate

They use the exact same tools you learn this semester.

If you understand the toolkit, you can:

  • follow what an agent is doing
  • judge whether the output is what you actually wanted
  • fix things when it gets them wrong

GitHub Copilot: your coding companion

Ghost text — inline completions as you type:

  • Write a comment describing what you want
  • Copilot suggests the code in grey
  • Press Tab to accept, Esc to dismiss

Particularly useful for:

  • ggplot2 plots from natural language descriptions
  • data.table syntax (:=, .SD, by)
  • regex patterns and string manipulation
  • boilerplate (file I/O, API calls, package loading)

We will use Copilot throughout this course — it makes you faster, not lazier.

Reproducibility (Recap)

Covered in the previous session — remember the 8 building blocks from Gentzkow & Shapiro:

Automation, Version Control, Directories, Keys, Abstraction, Documentation, Management, Code Style

Everything we do today serves these principles.

Shell / Bash

Shell

  • Terminology: shell, terminal, tty, command prompt, etc.
    • Same same: command line interface (CLI)
  • Many shell variants: focus on Bash (“Bourne again shell”)
  • Included by default on Linux and macOS
  • Windows users need to install a Bash-compatible shell

Why the Shell?

  • Powerful: executing commands and fixing problems
    • some things you just can’t do in an IDE or GUI
  • Reproducibility: scripting is reproducible, clicking is not
  • Remote: interacting with servers and supercomputers
  • Automation: workflow and analysis pipelines, e.g. with Makefile

Navigation, Files and Directories

Basics

username@hostname:~$
  • username denotes a specific user
  • hostname denotes name of the computer
  • :~ denotes the directory path (where ~ signifies the user’s home directory)
  • $ denotes the start of the command prompt (# for root)

Keyboard Shortcuts

  • Tab completion
  • Up/Down keys to scroll through previous commands
  • Ctrl + Right (and Ctrl + Left) to skip whole words at a time
  • Ctrl + a moves the cursor to the beginning of the line
  • Ctrl + e moves the cursor to the end of the line
  • Ctrl + k deletes everything to the right of the cursor
  • Ctrl + u deletes everything to the left of the cursor
  • Ctrl + Shift + c to copy and Ctrl + Shift + v to paste (Linux; macOS uses Cmd + c / Cmd + v)

Syntax

command option(s) argument(s)

astronaut $ ls -lh
total 4.0K
drwxr-xr-x 3 astronaut astronaut  96 Apr 26 19:03 01-getting-started
drwxr-xr-x 2 astronaut astronaut  64 Apr 26 19:03 02-toolkit
-rw-r--r-- 1 astronaut astronaut 135 Apr 19 15:43 README.md
  • Options start with a dash, usually one letter
  • Multiple options can be chained under a single dash, sometimes two
$ ls -lah 01-getting-started/
$ ls --group-directories-first --human-readable 01-getting-started/

Create Files and Directories

  • touch and mkdir
$ mkdir testing
$ touch testing/test1.txt testing/test2.txt testing/test3.txt
$ ls testing
test1.txt  test2.txt  test3.txt

Removing Files and Directories

  • rm
$ rm testing/test1.txt
$ ls testing
test2.txt  test3.txt
$ rm testing
rm: cannot remove 'testing': Is a directory
$ rm -rf testing
$ ls testing
ls: cannot access 'testing': No such file or directory
  • “recursive” (-r or -R) and “force” (-f) options

Copying

  • cp object path/copyname (keeps old name if not provided with new one)
$ touch example.txt
$ mkdir testing
$ cp example.txt testing
$ ls testing
example.txt

Moving and Renaming

  • mv object path/newobjectname
$ mv example.txt testing/example2.txt
$ ls testing
example2.txt  example.txt
$ mv testing/example2.txt testing/example_new.txt
$ ls testing
example_new.txt  example.txt

Working with Text and Pipes

Working with Text Files

  • Print whole file with cat (“concatenate”)
$ cat -n input/sonnets.txt
  • Print only first or last lines with head and tail
$ head -n 3 input/sonnets.txt   ## First 3 rows
$ tail -n 1 input/sonnets.txt   ## Last row

Working with Text Files: grep

  • Search within files: grep (“Global regular expression print”)
$ wc input/sonnets.txt
 2633 17698 95662 input/sonnets.txt

$ grep -n "Shall I compare thee" input/sonnets.txt

Redirect

  • Send output from the shell to a file using redirect operator >
$ echo "At first, I was afraid, I was petrified" > survive.txt
$ find survive.txt
survive.txt
  • To append to a file, use >> (> overwrites)
$ echo "'Kept thinking I could never live without you by my side" >> survive.txt
$ cat survive.txt
At first, I was afraid, I was petrified
'Kept thinking I could never live without you by my side

Pipes

  • Send (“pipe”) output to another command with |
    • chain together a sequence of simple operations
$ cat -n input/sonnets.txt | head -n100 | tail -n10

Loops and Scripting

Loops

  • Repeat operation over a set: Loops
for i in LIST
do
  OPERATION $i
done
  • Example: numbering text files
$ n=1
$ for f in input/*.txt
> do
>   echo "=== File $n: $f ==="
>   head -n 2 "$f"
>   n=$((n + 1))
> done

Scripting

  • .sh files with code can be executed
#!/bin/sh
echo -e "\nHello World!\n"
  • #!/bin/sh is a shebang, indicating which program to run the command with
$ bash code/00-shell-exercise.sh
Hello World!

Running Other Languages from the Shell

  • Not limited to running shell scripts in the shell
  • Example: Rscript
$ Rscript -e 'cat("Hello World, from R!")'
Hello World, from R!

Make

Make: Automate Your Pipeline

  • make automates the sequence from raw data → results → paper
  • Define targets, prerequisites, and recipes in a Makefile
  • Only re-runs steps whose inputs have changed
# Makefile
paper.pdf: paper.tex figures/plot.png
    pdflatex paper.tex

figures/plot.png: output/results.csv code/plot.R
    Rscript code/plot.R

output/results.csv: input/data.csv code/analysis.R
    Rscript code/analysis.R
  • Run make and it figures out what needs rebuilding
  • Change the data? Only the downstream steps re-run

Git (Refresher)

Git: Track Everything

Module 01 introduced version control — here are the commands you’ll use daily:

$ git init                          # start tracking a project
$ git status                        # what changed?
$ git add code/01-analysis.R        # stage specific files
$ git commit -m "Add analysis"      # save a snapshot
$ git log --oneline                 # view history
  • Commit early, commit often — each commit is a save point you can return to
  • Write short, descriptive messages: what you did and why
  • git diff shows exactly what changed since the last commit

R Basics

R Basics

  • A great calculator
  • Logic, negation, evaluation (==), matching (%in%)
    • careful: floating-point numbers
    • better: all.equal()
  • Assignment with = or <-
  • Questions? help(plot) or ?plot
  • Commenting with #

Objects

  • vectors
  • matrices
  • data frames (and derivatives like data.table and tibble)
  • lists
  • functions
  • etc.

Conversion Between Objects

# Create a small data frame called "d"
d = data.frame(x = 1:2, y = 3:4)
d
#>   x y
#> 1 1 3
#> 2 2 4

# Convert it to (i.e. create) a matrix called "m"
m = as.matrix(d)
m
#>      x y
#> [1,] 1 3
#> [2,] 2 4

Class, Type and Structure

# Evaluate its class
class(d)
#> [1] "data.frame"

# Evaluate its type
typeof(d)
#> [1] "list"

# Show its structure
str(d)
#> 'data.frame':   2 obs. of  2 variables:
#>  $ x: int  1 2
#>  $ y: int  3 4

Global Environment

# View d
View(d)
d
#>   x y
#> 1 1 3
#> 2 2 4

# Use d to run command
lm(y ~ x)
#> Error in eval(predvars, data, env) : object 'y' not found

lm(y ~ x, data = d)
#> Call:
#> lm(formula = y ~ x, data = d)
#>
#> Coefficients:
#> (Intercept)            x
#>           2            1

Reserved Words

  • Fundamental commands, operators and relations cannot be reassigned
if
else
while
function
for
TRUE
FALSE
NULL
Inf
NaN
NA

Semi-reserved Words

my_vector = c(1, 2, 5)
my_vector
#> [1] 1 2 5

c = 4
c(1, 2, 5)
#> [1] 1 2 5
c
#> [1] 4

pi
#> [1] 3.141593

pi = 2
pi
#> [1] 2

Indexing: []

a = 1:10
a[4]
#> [1] 4
a[c(4, 6)]
#> [1] 4 6

m[1, 1]
#> x
#> 1

my_list = list(a = "hello", b = c(1, 2, 3),
               c = data.frame(x = 1:5, y = 6:10))
my_list[[1]]
#> [1] "hello"
my_list[[2]][3]
#> [1] 3

Indexing: $

my_list
#> $a
#> [1] "hello"
#>
#> $b
#> [1] 1 2 3
#>
#> $c
#>   x  y
#> 1 1  6
#> 2 2  7
#> 3 3  8
#> 4 4  9
#> 5 5 10

Indexing: $ (continued)

my_list$a
#> [1] "hello"

my_list$b[3]
#> [1] 3

my_list$c$x
#> [1] 1 2 3 4 5

Indexing: $ and the Global Environment

# Remember the earlier problem?
lm(d$y ~ d$x)
#> Call:
#> lm(formula = d$y ~ d$x)
#>
#> Coefficients:
#> (Intercept)          d$x
#>           2            1

Functions

  • A lot of functionality in “base R”
    • in-built functions, like lm()
  • User-built functions are easy to implement
example_function = function(a, b) {
  output = a + b
  return(output)
}
example_function(1, 2)
#> [1] 3

Installing Packages

# pacman: install-if-missing + load
if (!require("pacman")) install.packages("pacman"); library(pacman)
p_load(data.table)
p_load(ggplot2)
p_load_current_gh("ropensci/rnaturalearthhires")  # from GitHub
  • pacman — single-line install + load; good for reproducible teaching setups

Libraries

  • Community-built (set of) functions: libraries or packages
library(data.table)
#> data.table 1.16.4 using 4 threads (see ?getDTthreads).
#> Latest news: r-datatable.com

The Pipe: %>%

  • The pipe passes the left-hand side as the first argument of the right-hand side
# Without pipe
head(subset(mtcars, cyl == 4), 3)

# With pipe %>%
library(magrittr)
mtcars %>% subset(cyl == 4) %>% head(3)
  • We use %>% (magrittr pipe) throughout this course
  • R 4.1+ also has a native pipe |> — same idea, no import needed
  • You’ll see both in the wild; they are interchangeable for most purposes

Data Frames vs Tibbles vs data.tables

You’ll encounter all three — they’re all rectangular data, with different trade-offs:

data.frame tibble data.table
Package base R tidyverse data.table
Print all rows fits screen fits screen
Speed base base very fast
Syntax df[row, col] dplyr verbs dt[i, j, by]
Best for small data tidy pipelines large data
  • tibble is a data.frame with nicer defaults
  • data.table modifies in place — crucial for memory on large data
  • We’ll use all three; Module 03 goes deeper

Quarto: Literate Programming

  • Quarto is the next generation of R Markdown
    • supports R, Python, Julia, and Observable JS
    • renders to HTML, PDF, Word, slides (reveal.js), websites, books, …
  • A single .qmd file combines prose, code, and output
  • Replaces R Markdown (.Rmd) for new projects
  • Learn more: https://quarto.org

Quarto: Minimal Example

---
title: "My Analysis"
format: html
---

## Data

```{r}
library(data.table)
as.data.table(mtcars) %>% head()
```

## Plot

```{r}
library(ggplot2)
mtcars %>% ggplot(aes(wt, mpg)) + geom_point()
```
# Render from the terminal
quarto render analysis.qmd           # → analysis.html
quarto render analysis.qmd --to pdf  # → analysis.pdf

Wrap Up

Wrap Up

  • AI agents use shell, R, git, Make — learning the toolkit means understanding what they do
  • Shell essentials, Make, Git, R basics, Quarto
  • Next session: Working with large structured data

Further reading