Probabilistic Models
Disclaimer: These are my personal notes compiled for my own reference and learning. They may contain errors, incomplete information, or personal interpretations. While I strive for accuracy, these notes are not peer-reviewed and should not be considered authoritative sources. Please consult official textbooks, research papers, or other reliable sources for academic or professional purposes.
1. Introduction to Probabilistic Models
Probabilistic models use probability theory to represent uncertainty in data and model parameters. They provide a principled framework for reasoning under uncertainty.
2. Bayes' Theorem
The foundation of Bayesian inference:
where:
- $P(\theta|D)$ is the posterior distribution
- $P(D|\theta)$ is the likelihood
- $P(\theta)$ is the prior distribution
- $P(D)$ is the marginal likelihood (evidence)
3. Maximum Likelihood Estimation (MLE)
Find parameters that maximize the likelihood:
Often optimized using log-likelihood:
4. Maximum A Posteriori (MAP) Estimation
Find parameters that maximize the posterior:
Reduces to MLE when prior is uniform.
5. Bayesian Inference
Instead of point estimates, maintain full posterior distribution:
Predictions integrate over all possible parameters:
6. Conjugate Priors
Priors that, when combined with the likelihood, yield a posterior in the same family.
6.1 Beta-Binomial
For binomial likelihood with beta prior:
6.2 Gaussian-Gaussian
For Gaussian likelihood with Gaussian prior on mean:
7. Graphical Models
Visual representations of probabilistic relationships between variables.
7.1 Bayesian Networks (Directed)
Directed acyclic graphs representing conditional dependencies:
7.2 Markov Random Fields (Undirected)
Undirected graphs with potential functions:
where $Z$ is the partition function and $\psi_c$ are clique potentials.
8. Hidden Markov Models (HMMs)
Model for sequential data with hidden states:
8.1 Forward Algorithm
Compute marginal likelihood:
8.2 Viterbi Algorithm
Find most likely state sequence:
9. Gaussian Mixture Models (GMM)
Mixture of Gaussian distributions:
where $\pi_k$ are mixing coefficients with $\sum_k \pi_k = 1$.
10. Expectation-Maximization (EM) Algorithm
Iterative algorithm for maximum likelihood in latent variable models:
10.1 E-step
Compute posterior over latent variables:
10.2 M-step
Maximize expected log-likelihood:
11. Variational Inference
Approximate intractable posteriors with simpler distributions:
11.1 Mean Field Variational Inference
Assume factorized posterior:
11.2 Evidence Lower Bound (ELBO)
12. Markov Chain Monte Carlo (MCMC)
Sample from complex distributions using Markov chains.
12.1 Metropolis-Hastings Algorithm
- Propose new state: $\theta' \sim q(\theta'|\theta^{(t)})$
- Compute acceptance ratio: $\alpha = \min\left(1, \frac{P(\theta')q(\theta^{(t)}|\theta')}{P(\theta^{(t)})q(\theta'|\theta^{(t)})}\right)$
- Accept with probability $\alpha$
12.2 Gibbs Sampling
Sample each variable conditioned on others:
13. Naive Bayes Classifier
Assumes conditional independence of features:
14. Gaussian Processes
Non-parametric Bayesian method for regression and classification:
where $m(x)$ is the mean function and $k(x, x')$ is the covariance function.
15. Bayesian Neural Networks
Place priors over neural network weights:
Predictions average over weight posterior.
16. Code Example
# Python implementation of probabilistic models
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy.special import logsumexp
class GaussianMixtureModel:
def __init__(self, n_components=2, random_state=None):
self.n_components = n_components
self.random_state = random_state
def fit(self, X, max_iter=100, tol=1e-6):
"""Fit GMM using EM algorithm"""
np.random.seed(self.random_state)
n_samples, n_features = X.shape
# Initialize parameters
self.weights_ = np.ones(self.n_components) / self.n_components
self.means_ = X[np.random.choice(n_samples, self.n_components, replace=False)]
self.covariances_ = np.array([np.cov(X.T) for _ in range(self.n_components)])
log_likelihood_old = -np.inf
for iteration in range(max_iter):
# E-step: compute responsibilities
responsibilities = self._e_step(X)
# M-step: update parameters
self._m_step(X, responsibilities)
# Check convergence
log_likelihood = self._compute_log_likelihood(X)
if abs(log_likelihood - log_likelihood_old) < tol:
print(f"Converged after {iteration + 1} iterations")
break
log_likelihood_old = log_likelihood
return self
def _e_step(self, X):
"""Expectation step"""
n_samples = X.shape[0]
responsibilities = np.zeros((n_samples, self.n_components))
for k in range(self.n_components):
responsibilities[:, k] = self.weights_[k] * stats.multivariate_normal.pdf(
X, self.means_[k], self.covariances_[k]
)
# Normalize
responsibilities /= responsibilities.sum(axis=1, keepdims=True)
return responsibilities
def _m_step(self, X, responsibilities):
"""Maximization step"""
n_samples = X.shape[0]
for k in range(self.n_components):
resp_k = responsibilities[:, k]
# Update weights
self.weights_[k] = resp_k.sum() / n_samples
# Update means
self.means_[k] = (resp_k[:, np.newaxis] * X).sum(axis=0) / resp_k.sum()
# Update covariances
diff = X - self.means_[k]
self.covariances_[k] = np.dot(resp_k * diff.T, diff) / resp_k.sum()
def _compute_log_likelihood(self, X):
"""Compute log-likelihood"""
n_samples = X.shape[0]
log_likelihood = 0
for i in range(n_samples):
sample_likelihood = 0
for k in range(self.n_components):
sample_likelihood += self.weights_[k] * stats.multivariate_normal.pdf(
X[i], self.means_[k], self.covariances_[k]
)
log_likelihood += np.log(sample_likelihood)
return log_likelihood
def predict_proba(self, X):
"""Predict component probabilities"""
return self._e_step(X)
def predict(self, X):
"""Predict component assignments"""
return self.predict_proba(X).argmax(axis=1)
class BayesianLinearRegression:
def __init__(self, alpha=1.0, beta=1.0):
"""
alpha: precision of prior over weights
beta: precision of noise
"""
self.alpha = alpha
self.beta = beta
def fit(self, X, y):
"""Fit Bayesian linear regression"""
# Add bias term
X_with_bias = np.column_stack([np.ones(X.shape[0]), X])
# Prior precision matrix
S0_inv = self.alpha * np.eye(X_with_bias.shape[1])
# Posterior precision matrix
self.SN_inv = S0_inv + self.beta * X_with_bias.T @ X_with_bias
self.SN = np.linalg.inv(self.SN_inv)
# Posterior mean
self.mN = self.beta * self.SN @ X_with_bias.T @ y
return self
def predict(self, X, return_std=False):
"""Make predictions with uncertainty"""
X_with_bias = np.column_stack([np.ones(X.shape[0]), X])
# Predictive mean
y_mean = X_with_bias @ self.mN
if return_std:
# Predictive variance
y_var = 1/self.beta + np.diag(X_with_bias @ self.SN @ X_with_bias.T)
y_std = np.sqrt(y_var)
return y_mean, y_std
return y_mean
def naive_bayes_gaussian(X_train, y_train, X_test):
"""Gaussian Naive Bayes classifier"""
classes = np.unique(y_train)
n_classes = len(classes)
n_features = X_train.shape[1]
# Compute class priors
class_priors = np.array([np.mean(y_train == c) for c in classes])
# Compute class-conditional means and variances
means = np.zeros((n_classes, n_features))
variances = np.zeros((n_classes, n_features))
for i, c in enumerate(classes):
X_c = X_train[y_train == c]
means[i] = np.mean(X_c, axis=0)
variances[i] = np.var(X_c, axis=0)
# Make predictions
n_test = X_test.shape[0]
log_probs = np.zeros((n_test, n_classes))
for i, c in enumerate(classes):
# Log prior
log_probs[:, i] = np.log(class_priors[i])
# Log likelihood (assuming independence)
for j in range(n_features):
log_probs[:, i] += stats.norm.logpdf(X_test[:, j], means[i, j], np.sqrt(variances[i, j]))
# Normalize to get probabilities
log_probs -= logsumexp(log_probs, axis=1, keepdims=True)
probs = np.exp(log_probs)
predictions = classes[np.argmax(probs, axis=1)]
return predictions, probs
# Example usage
if __name__ == "__main__":
# Generate synthetic data for GMM
np.random.seed(42)
# Create mixture of two Gaussians
n_samples = 300
X1 = np.random.multivariate_normal([2, 2], [[1, 0.5], [0.5, 1]], n_samples//2)
X2 = np.random.multivariate_normal([-1, -1], [[1, -0.3], [-0.3, 1]], n_samples//2)
X_gmm = np.vstack([X1, X2])
# Fit GMM
gmm = GaussianMixtureModel(n_components=2, random_state=42)
gmm.fit(X_gmm)
# Plot results
plt.figure(figsize=(12, 4))
plt.subplot(1, 3, 1)
plt.scatter(X_gmm[:, 0], X_gmm[:, 1], alpha=0.6)
plt.title('Original Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.subplot(1, 3, 2)
predictions = gmm.predict(X_gmm)
plt.scatter(X_gmm[:, 0], X_gmm[:, 1], c=predictions, alpha=0.6, cmap='viridis')
plt.scatter(gmm.means_[:, 0], gmm.means_[:, 1], c='red', marker='x', s=200, linewidths=3)
plt.title('GMM Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
# Bayesian Linear Regression example
plt.subplot(1, 3, 3)
# Generate synthetic regression data
x_true = np.linspace(0, 1, 20)
y_true = 0.5 * x_true + 0.3 * np.sin(2 * np.pi * x_true) + 0.1 * np.random.randn(20)
# Fit Bayesian linear regression
X_poly = np.column_stack([x_true**i for i in range(1, 4)]) # Polynomial features
blr = BayesianLinearRegression(alpha=2.0, beta=25.0)
blr.fit(X_poly, y_true)
# Make predictions
x_test = np.linspace(0, 1, 100)
X_test_poly = np.column_stack([x_test**i for i in range(1, 4)])
y_pred, y_std = blr.predict(X_test_poly, return_std=True)
plt.scatter(x_true, y_true, alpha=0.7, label='Data')
plt.plot(x_test, y_pred, 'r-', label='Mean prediction')
plt.fill_between(x_test, y_pred - 2*y_std, y_pred + 2*y_std, alpha=0.3, color='red', label='±2σ')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Bayesian Linear Regression')
plt.legend()
plt.tight_layout()
plt.show()
print(f"GMM component weights: {gmm.weights_}")
print(f"GMM component means:\n{gmm.means_}")
17. References
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning.
- Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective.
- Barber, D. (2012). Bayesian Reasoning and Machine Learning.