Regression Methods

Last updated: December 2024

Disclaimer: These are my personal notes compiled for my own reference and learning. They may contain errors, incomplete information, or personal interpretations. While I strive for accuracy, these notes are not peer-reviewed and should not be considered authoritative sources. Please consult official textbooks, research papers, or other reliable sources for academic or professional purposes.

1. Linear Regression

The basic linear regression model is:

$$Y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \cdots + \beta_k X_{ik} + \epsilon_i$$

where $\epsilon_i \sim N(0, \sigma^2)$ and $i = 1, 2, \ldots, n$.

2. Ordinary Least Squares (OLS)

The OLS estimator minimizes the sum of squared residuals:

$$\min_{\beta} \sum_{i=1}^n (Y_i - X_i'\beta)^2$$

The solution is:

$$\hat{\beta} = (X'X)^{-1}X'Y$$

3. Assumptions (Gauss-Markov)

4. Properties of OLS

Under the Gauss-Markov assumptions, OLS is:

5. Heteroscedasticity

When $Var(\epsilon_i|X_i) \neq \sigma^2$, we have heteroscedasticity. Solutions:

6. Multicollinearity

When predictors are highly correlated, we can use:

7. Model Selection

Information criteria for model selection:

$$AIC = 2k - 2\ln(L)$$ $$BIC = k\ln(n) - 2\ln(L)$$

where $k$ is the number of parameters, $n$ is sample size, and $L$ is the likelihood.

8. Diagnostics

9. Code Example

# R code for linear regression
library(car)

# Fit model
model <- lm(y ~ x1 + x2 + x3, data = mydata)
summary(model)

# Check assumptions
plot(model)
vif(model)  # Check multicollinearity

# Robust standard errors
library(sandwich)
library(lmtest)
coeftest(model, vcov = vcovHC(model, type = "HC1"))

10. References