Regression Methods
Disclaimer: These are my personal notes compiled for my own reference and learning. They may contain errors, incomplete information, or personal interpretations. While I strive for accuracy, these notes are not peer-reviewed and should not be considered authoritative sources. Please consult official textbooks, research papers, or other reliable sources for academic or professional purposes.
1. Linear Regression
The basic linear regression model is:
where $\epsilon_i \sim N(0, \sigma^2)$ and $i = 1, 2, \ldots, n$.
2. Ordinary Least Squares (OLS)
The OLS estimator minimizes the sum of squared residuals:
The solution is:
3. Assumptions (Gauss-Markov)
- Linearity: $E[Y|X] = X\beta$
- Random sampling: $(X_i, Y_i)$ are i.i.d.
- No perfect multicollinearity: $X$ has full rank
- Homoscedasticity: $Var(\epsilon_i|X_i) = \sigma^2$
- No autocorrelation: $Cov(\epsilon_i, \epsilon_j|X) = 0$ for $i \neq j$
4. Properties of OLS
Under the Gauss-Markov assumptions, OLS is:
- Unbiased: $E[\hat{\beta}] = \beta$
- BLUE: Best Linear Unbiased Estimator
- Consistent: $\hat{\beta} \xrightarrow{p} \beta$
5. Heteroscedasticity
When $Var(\epsilon_i|X_i) \neq \sigma^2$, we have heteroscedasticity. Solutions:
- Robust standard errors: White's estimator
- Weighted Least Squares (WLS): When variance structure is known
- Feasible GLS: When variance structure is estimated
6. Multicollinearity
When predictors are highly correlated, we can use:
- Ridge Regression: Adds penalty $\lambda \sum_{j=1}^k \beta_j^2$
- Lasso: Adds penalty $\lambda \sum_{j=1}^k |\beta_j|$
- Elastic Net: Combines both penalties
7. Model Selection
Information criteria for model selection:
where $k$ is the number of parameters, $n$ is sample size, and $L$ is the likelihood.
8. Diagnostics
- Residual plots: Check for patterns
- Q-Q plots: Check normality
- Leverage: $h_{ii} = X_i'(X'X)^{-1}X_i$
- Cook's distance: Measure of influence
9. Code Example
# R code for linear regression
library(car)
# Fit model
model <- lm(y ~ x1 + x2 + x3, data = mydata)
summary(model)
# Check assumptions
plot(model)
vif(model) # Check multicollinearity
# Robust standard errors
library(sandwich)
library(lmtest)
coeftest(model, vcov = vcovHC(model, type = "HC1"))
10. References
- Wooldridge, J. M. (2016). Introductory Econometrics: A Modern Approach.
- Greene, W. H. (2018). Econometric Analysis.
- Hayashi, F. (2000). Econometrics.