08. Time as Data / Event Studies

Data Science for Economists

Irene Iodice

2026-03-01

One vote, two reactions: What happened on March 13?

Event: 13 March 2024 – European Parliament adopts the EU Artificial Intelligence Act.

Google (AI deployer)

Stock rose – less strict regulation than expected?

Nvidia (AI infrastructure)

Stock fell – signal of weaker AI demand ahead?

Why do two companies involved in AI move in opposite directions?

Learn to answer questions like this

How can we tell if an event truly changed something?

  • Would Nvidia’s stock have flattened even without the AI Act?
  • Was Google’s rise part of a positive trend – or is it a specific market signal?
  • When is a price movement just noise – and when is it meaningful?

Our toolkit:

  • Handle and align time-stamped data
  • Use event studies to estimate causal impact
  • Build and test counterfactuals

We start from time. Then move to causality.

Why events in time matter

Why study events in time?

Event data is any data that you want to measure about an event

  • Policy, shocks, news \(\to\) causal impact on markets, firms, outcomes.
  • Requires two lenses: when did it happen and how did series respond.
  • Today: (basic) workflows for time handling & causal inference.

Learning Objectives

  1. Parse, manipulate, and align timestamps in R (lubridate).
  2. Distinguish event, time trend, and outcome.
  3. Execute simple and regression-based event studies.
  4. Know when to pivot to DiD, staggered adoption, or RDD.
  5. Build structural counterfactuals with the gravity model.

Parsing, aligning, and manipulating timestamps

What is a time-stamped datum?

  • Cross-section: values observed once per unit (firm, county, tweet).
  • Time record: each observation carries a timestamp \(\;\Rightarrow\;\) ordering, lags, windows.

\[ \boxed{\;\text{datum} = (\text{ID},\, \textcolor{#e5b567}{t},\, \textcolor{#9e86c8}{\text{attributes}})\;} \]

From now on every method we use must respect this ordering.

Granularity of time

  Year --- Quarter --- Month --- Day --- Hour --- Minute --- Second --- Tick
  |                                                                       |
  coarsest                                                           finest
  • Choose the coarsest frequency that still captures the causal effect.
  • Finer \(\Rightarrow\) more observations, but also noise and dependence.

Date–time objects in R

  • Base R distinguishes Date (days) and POSIXct/POSIXlt (date–times, seconds).
  • The lubridate package (tidyverse) provides a grammar for working with them.
  • Always store timestamps with an explicit time zone – preferably UTC.

Creating date and date–time values

Extracting & modifying components

Spans of time: durations, periods, intervals

Rounding and aligning timestamps

Regular vs. irregular sampling

Equally spaced

Observations at fixed intervals (daily close, monthly GDP).

Irregular / event-driven

Observations arrive at uneven times (trades, tweets, sensor pings).

  • Irregular streams often need resampling.
  • Beware of aggregation bias and missing-data artefacts.

High-frequency data: promises & pitfalls

  • Volume: millions of rows \(\Rightarrow\) storage, speed, and parallel algorithms.
  • Micro-structure noise: bid-ask bounce, timestamp jitter.
  • Simultaneity: many units react within milliseconds.
  • Multiple hypothesis risk: easy to find spurious “events”.

Use HF data only when theory needs sub-daily resolution, and always report how you filtered and aligned the raw feed.

zoo vs. lubridate: Different Tools for Time

zoo (Zeileis, Grothendieck)

  • Time-indexed vectors and matrices
  • Designed for irregular or financial time series
  • Fast rolling stats: rollmean(), rollapply()
  • Plays well with xts, quantmod
  • Base R-style syntax

lubridate (part of tidyverse)

  • Simplifies parsing and modifying Date/POSIX objects
  • Grammar for extracting: year(), month(), wday()
  • Useful for aligning dates, durations, and intervals
  • Integrates naturally with dplyr, ggplot2
  • Ideal for tidy data workflows

Use zoo for time-series math. Use lubridate to parse, clean, and wrangle timestamps.

Isolating causal effects from time series

Event Studies

Event study is probably the oldest and simplest causal inference research design

  • Effect of stock splits on stock prices (Dolley 1933; MacKinlay 1997)
  • The information content of earnings announcements (Ball and Brown 1968)

Fama calls event studies a test of how quickly security prices reflect public information announcements (Fama 1991, p. 1576).

(\(\neq\) Marketing lit: assume market efficiency to measure the value of campaigns, …)

DAGs: Visualising Causal Assumptions

  • DAGs help us visualize assumptions about causal structure.
  • Each arrow encodes a causal relationship between variables.
  • They help identify confounders, mediators, and colliders.
  • Rule: No cycles – a variable cannot cause itself, directly or indirectly.

Treatment \(\longrightarrow\) Outcome

A DAG is a map of our model assumptions – not data.

Confounding and the Back-Door Criterion

  • A back-door path is a non-causal path from Treatment to Outcome that could bias our estimates.
  • To identify the causal effect, we must block all such paths – usually by controlling for confounders.
  • A variable satisfies the back-door criterion if it blocks all back-door paths and is not a collider.

Z

\(\swarrow\) \(\searrow\)

T \(\longrightarrow\) Y

Controlling for Z blocks the confounding path and helps isolate the causal effect.

The impact of COVID-19 on small business

  • Treatment = Pandemic \(\rightarrow\) Outcome = Survival
    • Time series: looking at pre and post pandemics outcome
  • Pandemic \(\leftarrow\) After Event \(\leftarrow\) Time \(\rightarrow\) Outcome
    • All the stuff that changes over time independently of the Pandemic

Financial Fragility of Small Business

Survey to SME: “roughly how much cash (e.g. in savings, checking) do you have access to without seeking further loans or money from family or friends to pay for your business?”

Bartik et al. (2020), The impact of COVID-19 on small business outcomes

Counterfactual Question

Would those firms that went bankrupt, have gone bankrupt even without the pandemic?

  1. Whatever was going on before would have continued doing its thing if not for the treatment
  2. How the actual outcome deviates from that prediction
  3. The extent of the deviation is the effect of treatment

Pre-Trend Analysis

Practical Corner: Boeing bailout

Boeing stock plunges again after coronavirus bailout quest spooks investors

Would this happen even without the bailout? What does the red line tell you?

Check more about the bailout here.

Practical Corner: Getting the data

Practical Corner: Plotting

Event Study Design with Stock Markets: Meta

On February 2nd 2022, Meta (FB) released that its global daily active users declined from the previous quarter for the first time, to 1.929 billion from 1.930 billion.

Event Study Design: Steps

  1. Event Identification:

    • e.g., dividends, M&A, stock buyback, laws or regulation, privatization vs. nationalization, celebrity endorsements, name changes, or brand extensions etc.
    • Events must affect either cash flows or the value of the firm (A. Sorescu, Warren, and Ertekin 2017, 191)
  2. Pick an estimation period

  3. Pick an observation period

Event Study Design: Abnormal Returns

Use the data from the estimation period to estimate a model predicting stock returns in each period:

  1. Mean-adjusted returns model: average in the estimation period \(\hat{R}=\bar{R}\)
  2. Market-adjusted returns model: use the market return in each period \(\hat{R}=R_{M}\)
  3. Risk-adjusted returns model: relation in the estimation period between returns

\[R = \alpha + \beta R_{M} + \epsilon \qquad \hat{R} = E[R \mid R_{M}]\]

  • Calculate abnormal return \(AR = R - \hat{R}\)
  • Is AR constant during the observation period?

Code: Estimation and observation data

Code: Computing abnormal returns

Meta returns around the announcement

Meta (FB) global daily active users declined from the previous quarter for the first time, to 1.929 billion from 1.930 billion.

Why is the Abnormal Return so Short-lived?

What we observe: META’s stock dropped sharply after Feb 2, 2022 – but the abnormal return lasted only 1–2 days.

Why? Efficient Markets Digest News Quickly

  • Prices adjust immediately when new public information arrives.
  • The drop reflects a one-time surprise (decline in active users).
  • After the shock, returns revert to normal levels.

Key idea: Abnormal return captures the difference from expected return, not the full price level.

  • The price may stay low.
  • But the “shock” only happens once – when the news hits.

Abnormal return is short-lived because markets are fast. No new surprise, no new abnormal return.

Modelling long-lasting effects

\[ Y_t = \beta_0 + \beta_1 t + \beta_2 \text{After}_t + \beta_3 (t \times \text{After}_t) + \varepsilon_t \]

  • \(\beta_1\): pre-event trend.
  • \(\beta_2\): one-time jump.
  • \(\beta_3\): change in slope \(\Rightarrow\) persistent effect.

When to use it? Any intervention that keeps working over time: regulations, infrastructure, training programmes.

Serial correlation is inevitable – report HAC/Newey-West SEs.

Case study: UK ambulance quality-of-care policy

Policy introduced mid-2010 to improve pre-hospital care for heart attack / stroke.

  • Clear kink \(\to\) \(\beta_3 < 0\): mortality trend fell faster post-policy.
  • Taljaard et al. (2014) estimate HAC SEs to confirm significance.

Source: Taljaard, et al., 2014, Int. J. Epidemiology

From Simple Event Study to Other Designs

Classic event study = one unit, one date

\[\text{before} \;|\; \text{event} \;|\; \text{after}\]

When do we need more?

  • Many treated dates (staggered rollout) \(\to\) use staggered DiD
  • Treated & control groups \(\to\) use DiD or synthetic control
  • Treatment assigned by a cutoff (age, score) \(\to\) use RDD
  • No clean control – need theory \(\to\) use structural models

Goal: always find a credible counterfactual.

One shock, many firms: what changes?

Example. EU GDPR announcement hits every tech stock on the same day.

We now observe two kinds of variation:

  1. Time – before vs. after the announcement
  2. Cross-section – some firms more exposed than others

\[ Y_{it} = \beta_i + \beta_1 t + \beta_2 \text{After}_t + \beta_3\, t \times \text{After}_t + \varepsilon_{it} \]

  • \(\beta_i\) soaks up level differences between firms.
  • \(\beta_3\) captures whether the slope changes after the shock.

Key question: which variation identifies \(\beta_3\)?

Simulating an event study with fixest

Event study coefficients

Exploiting treated vs. control variation

Difference-in-Differences (DiD)

Idea: Compare the change in outcomes for a treated group to the change for a control group.

  • Controls for unit fixed effects via before/after difference.
  • Controls for common time shocks via treated vs. control difference.

Two-period, two-group notation:

\[ \underbrace{\delta_1}_{\text{Treatment effect}} = \bigl(\overline{Y}_{2,\text{treat}} - \overline{Y}_{2,\text{control}}\bigr) - \bigl(\overline{Y}_{1,\text{treat}} - \overline{Y}_{1,\text{control}}\bigr) \]

Parameter Meaning
\(\beta_0\) baseline in control
\(\beta_1\) baseline gap (treat vs. control)
\(\delta_0\) common time shock
\(\delta_1\) causal effect of treatment

Card & Krueger (1994): Minimum Wage DiD

Regression Discontinuity Design (RDD)

RDD: National Minimum Wage at age threshold

  • \(y_i\) is an employment-related measure for individual \(i\) (e.g. a dummy indicating employment status)
  • \(f(\text{age}, a)\) is a flexible polynomial in age with parameters \(a\)
  • \(X_i\) is a set of covariates for individual \(i\)

\(\beta\) is the (causal) effect on employment of the increase in the NMW from the youth to the adult rate.

The effect of the threshold on employment

Heterogeneous timing and robust estimators

Multiple affected groups: Staggered adoption

Everyone is treated, but treatment length differs by group.

Use case: policy is introduced in many different states during many different time periods.

Hoynes et al. (2016) use staggered roll-out as their identification strategy to assess the long-run effects of childhood access to the safety net.

Staggered adoption: Exposure varies by cohort

Modern staggered DiD with fixest::sunab()

Classical two-way fixed effects (TWFE) can be biased with staggered treatment timing (Goodman-Bacon 2021; de Chaisemartin & d’Haultfoeuille 2020).

Sun & Abraham (2021) propose an interaction-weighted estimator implemented in fixest:

This automatically handles heterogeneous treatment effects across cohorts.

Practitioner’s guide: Roth et al. (2023)

For a comprehensive overview of modern event-study / DiD methods, see:

Roth, J., Sant’Anna, P.H.C., Bilinski, A., & Poe, J. (2023). “What’s Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.” Journal of Econometrics, 235(2), 2218–2244.

Key takeaways:

  • Pre-trend tests have low power – a flat pre-trend does not guarantee parallel trends
  • Sensitivity analysis (e.g., HonestDiD package) can bound the bias
  • With staggered timing, use robust estimators (sunab(), did, did2s)
  • Report both static (average) and dynamic (event-time) treatment effects

Structural models as counterfactuals

This section is optional / advanced material.

Economic models as counterfactuals

Imagine you want to evaluate the effect of a Regional Trade Agreement (RTA) between countries on their trade flows.

What are Regional Trading Agreements? Regional trading agreements refer to a treaty that is signed by two or more countries to encourage the free movement of goods and services across the borders of its members.

What are the drivers of trade?

The Gravity Model of Trade

Counterfactuals in Trade

\(\text{Trade flows}_{ij} = \text{Size}_{i} \times \text{Size}_{j} \times \text{Frictions to trade}_{ij}\)

  • Size = \(\frac{Y_i E_j}{Y}\)
  • Frictions:
    1. Bilateral trade cost between partners \(i\) and \(j\) (\(t_{ij}\)): typically approximated by distance, tariffs, etc.
    2. Inward multilateral resistance (\(P_j\)): importer \(j\)’s ease of market access (Anderson & van Wincoop 2003).
    3. Outward multilateral resistance (\(\Pi_i\)): exporter \(i\)’s ease of market access.

Reduced-Form Estimation

OLS Estimation:

\[\ln X_{ij} = \beta_d \ln \text{Dist}_{ij} + \text{Controls}_{ij} + M_i + X_j + \epsilon_{ij}\]

PPML Estimation:

\[X_{ij} = \exp(\beta_d \ln \text{Dist}_{ij} + \text{Controls}_{ij} + M_i + X_j) + \epsilon_{ij}\]

PPML advantages: handles zeros, consistent with equilibrium conditions (Fally 2015), robust to heteroskedasticity. Disadvantage: prone to small sample bias.

Meta-analysis of gravity estimates

Intensive margin (export value)

Head and Mayer (2014)

Extensive margin (non-zero trade flows)

Head and Mayer (2014)

Gravity variable estimates & RTA effects

Source: Yotov et al.

The effect of RTAs

Other studies using the Gravity framework

  • Effect of trade liberalizations (NAFTA, Mexico-US-Canada) trade agreements, etc.
  • EU integration
  • Migration flows

Summary and further reading

Sources & further reading

  • Event studies: Nick Huntington-Klein, The Effect, Chapter on Event Studies
  • Modern DiD: Roth et al. (2023), “What’s Trending in Difference-in-Differences?”, Journal of Econometrics
  • Staggered DiD: Sun & Abraham (2021); Goodman-Bacon (2021); de Chaisemartin & d’Haultfoeuille (2020)
  • Gravity estimation: Structural Gravity in R
  • fixest package: Documentation

Appendix

Collider Bias: Gender and Occupation

Setup: We want to understand the relationship between gender and ability. In the general population, they are uncorrelated.

But what if we restrict attention to a single occupation?

  • Let occupation be influenced by both:
    • Gender (e.g., bias in hiring)
    • Ability (e.g., productivity or test scores)
  • Conditioning on occupation \(\Rightarrow\) opens a collider path
  • This induces a spurious negative correlation

Gender \(\qquad\) Ability

\(\searrow\) \(\qquad\) \(\swarrow\)

Occupation

Takeaway: Conditioning on a collider (like occupation) can induce bias. Even if gender and ability are independent in the population, they appear negatively correlated in a biased sample.

Source: Cunningham, Causal Inference: The Mixtape (2021), Ch. 3.5.