Causal Inference

Last updated: December 2024

Disclaimer: These are my personal notes compiled for my own reference and learning. They may contain errors, incomplete information, or personal interpretations. While I strive for accuracy, these notes are not peer-reviewed and should not be considered authoritative sources. Please consult official textbooks, research papers, or other reliable sources for academic or professional purposes.

1. Potential Outcomes Framework

For each unit $i$, we have potential outcomes:

$$Y_i(1): \text{outcome if treated}$$ $$Y_i(0): \text{outcome if not treated}$$

The individual treatment effect is:

$$\tau_i = Y_i(1) - Y_i(0)$$

2. Average Treatment Effect (ATE)

The average treatment effect is:

$$\tau = E[Y_i(1) - Y_i(0)] = E[Y_i(1)] - E[Y_i(0)]$$

We can only observe one potential outcome per unit (fundamental problem of causal inference).

3. Assumptions for Causal Inference

4. Randomized Experiments

In randomized experiments, treatment assignment is independent of potential outcomes:

$$(Y_i(1), Y_i(0)) \perp\!\!\!\perp D_i$$

The simple difference-in-means estimator is unbiased:

$$\hat{\tau} = \frac{1}{n_1}\sum_{i:D_i=1} Y_i - \frac{1}{n_0}\sum_{i:D_i=0} Y_i$$

5. Instrumental Variables (IV)

When treatment is endogenous, we need an instrument $Z_i$ that satisfies:

The IV estimator is:

$$\hat{\tau}_{IV} = \frac{Cov(Z_i, Y_i)}{Cov(Z_i, D_i)}$$

6. Regression Discontinuity Design (RDD)

When treatment assignment depends on a running variable $X_i$ with a cutoff $c$:

$$D_i = \mathbf{1}\{X_i \geq c\}$$

The treatment effect at the cutoff is:

$$\tau = \lim_{x \downarrow c} E[Y_i|X_i = x] - \lim_{x \uparrow c} E[Y_i|X_i = x]$$

7. Difference-in-Differences (DiD)

For panel data with treatment and control groups over time:

$$\tau = (E[Y_{i1}|D_i=1] - E[Y_{i0}|D_i=1]) - (E[Y_{i1}|D_i=0] - E[Y_{i0}|D_i=0])$$

Key assumption: Parallel trends in the absence of treatment.

8. Matching Methods

Match treated units with similar control units based on covariates $X_i$:

9. Propensity Score

The propensity score is $p(X_i) = P(D_i = 1|X_i)$. Under ignorability:

$$(Y_i(1), Y_i(0)) \perp\!\!\!\perp D_i | p(X_i)$$

This allows us to control for high-dimensional $X_i$ by controlling for the scalar $p(X_i)$.

10. Code Example

# R code for causal inference
library(AER)
library(rdd)

# Instrumental Variables
iv_model <- ivreg(y ~ x | z, data = mydata)
summary(iv_model)

# Regression Discontinuity
rd_model <- RDestimate(y ~ x, data = mydata, cutpoint = 0)
summary(rd_model)

# Difference-in-Differences
library(plm)
did_model <- plm(y ~ treated + post + treated:post, 
                 data = panel_data, model = "within")
summary(did_model)

11. References