Data Science for Economists
2026-03-01
Event: 13 March 2024 – European Parliament adopts the EU Artificial Intelligence Act.
Google (AI deployer)

Stock rose – less strict regulation than expected?
Nvidia (AI infrastructure)

Stock fell – signal of weaker AI demand ahead?
Why do two companies involved in AI move in opposite directions?
How can we tell if an event truly changed something?
Our toolkit:
We start from time. Then move to causality.
Why events in time matter
Event data is any data that you want to measure about an event
lubridate).Parsing, aligning, and manipulating timestamps
\[ \boxed{\;\text{datum} = (\text{ID},\, \textcolor{#e5b567}{t},\, \textcolor{#9e86c8}{\text{attributes}})\;} \]
From now on every method we use must respect this ordering.
Year --- Quarter --- Month --- Day --- Hour --- Minute --- Second --- Tick
| |
coarsest finest
Date (days) and POSIXct/POSIXlt (date–times, seconds).lubridate package (tidyverse) provides a grammar for working with them.Equally spaced
Observations at fixed intervals (daily close, monthly GDP).
Irregular / event-driven
Observations arrive at uneven times (trades, tweets, sensor pings).
Use HF data only when theory needs sub-daily resolution, and always report how you filtered and aligned the raw feed.
zoo (Zeileis, Grothendieck)
rollmean(), rollapply()xts, quantmodlubridate (part of tidyverse)
Date/POSIX objectsyear(), month(), wday()dplyr, ggplot2Use zoo for time-series math. Use lubridate to parse, clean, and wrangle timestamps.
Isolating causal effects from time series
Event study is probably the oldest and simplest causal inference research design
Fama calls event studies a test of how quickly security prices reflect public information announcements (Fama 1991, p. 1576).
(\(\neq\) Marketing lit: assume market efficiency to measure the value of campaigns, …)
Treatment \(\longrightarrow\) Outcome
A DAG is a map of our model assumptions – not data.
Z
\(\swarrow\) \(\searrow\)
T \(\longrightarrow\) Y
Controlling for Z blocks the confounding path and helps isolate the causal effect.
Survey to SME: “roughly how much cash (e.g. in savings, checking) do you have access to without seeking further loans or money from family or friends to pay for your business?”
Bartik et al. (2020), The impact of COVID-19 on small business outcomes
Would those firms that went bankrupt, have gone bankrupt even without the pandemic?
Boeing stock plunges again after coronavirus bailout quest spooks investors
Would this happen even without the bailout? What does the red line tell you?
Check more about the bailout here.
On February 2nd 2022, Meta (FB) released that its global daily active users declined from the previous quarter for the first time, to 1.929 billion from 1.930 billion.
Event Identification:
Pick an estimation period
Pick an observation period
Use the data from the estimation period to estimate a model predicting stock returns in each period:
\[R = \alpha + \beta R_{M} + \epsilon \qquad \hat{R} = E[R \mid R_{M}]\]
Meta (FB) global daily active users declined from the previous quarter for the first time, to 1.929 billion from 1.930 billion.
What we observe: META’s stock dropped sharply after Feb 2, 2022 – but the abnormal return lasted only 1–2 days.
Why? Efficient Markets Digest News Quickly
Key idea: Abnormal return captures the difference from expected return, not the full price level.
Abnormal return is short-lived because markets are fast. No new surprise, no new abnormal return.
\[ Y_t = \beta_0 + \beta_1 t + \beta_2 \text{After}_t + \beta_3 (t \times \text{After}_t) + \varepsilon_t \]
When to use it? Any intervention that keeps working over time: regulations, infrastructure, training programmes.
Serial correlation is inevitable – report HAC/Newey-West SEs.
Policy introduced mid-2010 to improve pre-hospital care for heart attack / stroke.


Source: Taljaard, et al., 2014, Int. J. Epidemiology
Classic event study = one unit, one date
\[\text{before} \;|\; \text{event} \;|\; \text{after}\]
When do we need more?
Goal: always find a credible counterfactual.
Example. EU GDPR announcement hits every tech stock on the same day.
We now observe two kinds of variation:
\[ Y_{it} = \beta_i + \beta_1 t + \beta_2 \text{After}_t + \beta_3\, t \times \text{After}_t + \varepsilon_{it} \]
Key question: which variation identifies \(\beta_3\)?
fixestExploiting treated vs. control variation
Idea: Compare the change in outcomes for a treated group to the change for a control group.
Two-period, two-group notation:
\[ \underbrace{\delta_1}_{\text{Treatment effect}} = \bigl(\overline{Y}_{2,\text{treat}} - \overline{Y}_{2,\text{control}}\bigr) - \bigl(\overline{Y}_{1,\text{treat}} - \overline{Y}_{1,\text{control}}\bigr) \]
| Parameter | Meaning |
|---|---|
| \(\beta_0\) | baseline in control |
| \(\beta_1\) | baseline gap (treat vs. control) |
| \(\delta_0\) | common time shock |
| \(\delta_1\) | causal effect of treatment |
\(\beta\) is the (causal) effect on employment of the increase in the NMW from the youth to the adult rate.
Heterogeneous timing and robust estimators
Everyone is treated, but treatment length differs by group.
Use case: policy is introduced in many different states during many different time periods.
Hoynes et al. (2016) use staggered roll-out as their identification strategy to assess the long-run effects of childhood access to the safety net.
fixest::sunab()Classical two-way fixed effects (TWFE) can be biased with staggered treatment timing (Goodman-Bacon 2021; de Chaisemartin & d’Haultfoeuille 2020).
Sun & Abraham (2021) propose an interaction-weighted estimator implemented in fixest:
This automatically handles heterogeneous treatment effects across cohorts.
For a comprehensive overview of modern event-study / DiD methods, see:
Roth, J., Sant’Anna, P.H.C., Bilinski, A., & Poe, J. (2023). “What’s Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.” Journal of Econometrics, 235(2), 2218–2244.
Key takeaways:
HonestDiD package) can bound the biassunab(), did, did2s)Structural models as counterfactuals
This section is optional / advanced material.
Imagine you want to evaluate the effect of a Regional Trade Agreement (RTA) between countries on their trade flows.
What are Regional Trading Agreements? Regional trading agreements refer to a treaty that is signed by two or more countries to encourage the free movement of goods and services across the borders of its members.
\(\text{Trade flows}_{ij} = \text{Size}_{i} \times \text{Size}_{j} \times \text{Frictions to trade}_{ij}\)
OLS Estimation:
\[\ln X_{ij} = \beta_d \ln \text{Dist}_{ij} + \text{Controls}_{ij} + M_i + X_j + \epsilon_{ij}\]
PPML Estimation:
\[X_{ij} = \exp(\beta_d \ln \text{Dist}_{ij} + \text{Controls}_{ij} + M_i + X_j) + \epsilon_{ij}\]
PPML advantages: handles zeros, consistent with equilibrium conditions (Fally 2015), robust to heteroskedasticity. Disadvantage: prone to small sample bias.
Intensive margin (export value)

Head and Mayer (2014)
Extensive margin (non-zero trade flows)

Head and Mayer (2014)


Source: Yotov et al.
Summary and further reading
fixest package: DocumentationSetup: We want to understand the relationship between gender and ability. In the general population, they are uncorrelated.
But what if we restrict attention to a single occupation?
Gender \(\qquad\) Ability
\(\searrow\) \(\qquad\) \(\swarrow\)
Occupation
Takeaway: Conditioning on a collider (like occupation) can induce bias. Even if gender and ability are independent in the population, they appear negatively correlated in a biased sample.
Source: Cunningham, Causal Inference: The Mixtape (2021), Ch. 3.5.