Neural Networks
Disclaimer: These are my personal notes compiled for my own reference and learning. They may contain errors, incomplete information, or personal interpretations. While I strive for accuracy, these notes are not peer-reviewed and should not be considered authoritative sources. Please consult official textbooks, research papers, or other reliable sources for academic or professional purposes.
1. Introduction to Neural Networks
Neural networks are computational models inspired by biological neural networks. They consist of interconnected nodes (neurons) that process information.
2. Perceptron
The simplest neural network with a single neuron:
where $f$ is the activation function, $\mathbf{w}$ are weights, $\mathbf{x}$ are inputs, and $b$ is the bias.
3. Activation Functions
3.1 Sigmoid
Output range: $(0, 1)$. Smooth but suffers from vanishing gradient problem.
3.2 Hyperbolic Tangent (tanh)
Output range: $(-1, 1)$. Zero-centered but still has vanishing gradient issues.
3.3 ReLU (Rectified Linear Unit)
Most popular activation. Simple, efficient, but can suffer from "dying ReLU" problem.
3.4 Leaky ReLU
where $\alpha$ is a small positive constant (e.g., 0.01).
4. Multi-Layer Perceptron (MLP)
Neural network with multiple layers:
- Input layer: Receives input data
- Hidden layers: Process information
- Output layer: Produces final output
5. Forward Propagation
For a network with $L$ layers:
for $l = 1, 2, \ldots, L$.
6. Loss Functions
6.1 Mean Squared Error (Regression)
6.2 Cross-Entropy (Classification)
7. Backpropagation
Algorithm to compute gradients using chain rule:
where $\delta^{[l]} = \frac{\partial L}{\partial \mathbf{z}^{[l]}}$ is the error at layer $l$.
8. Gradient Descent
8.1 Batch Gradient Descent
8.2 Stochastic Gradient Descent (SGD)
Update parameters using one sample at a time.
8.3 Mini-batch Gradient Descent
Update parameters using small batches of samples.
9. Regularization Techniques
9.1 L2 Regularization (Weight Decay)
9.2 Dropout
Randomly set some neurons to zero during training with probability $p$.
9.3 Batch Normalization
Normalize inputs to each layer:
10. Convolutional Neural Networks (CNNs)
Specialized for processing grid-like data (images).
10.1 Convolution Operation
10.2 Pooling
Reduce spatial dimensions:
- Max pooling: Take maximum value in region
- Average pooling: Take average value in region
11. Recurrent Neural Networks (RNNs)
Networks with memory for sequential data:
11.1 LSTM (Long Short-Term Memory)
Addresses vanishing gradient problem in RNNs with gating mechanisms.
11.2 GRU (Gated Recurrent Unit)
Simplified version of LSTM with fewer parameters.
12. Deep Learning Architectures
12.1 Autoencoders
Learn compressed representations of data.
12.2 Generative Adversarial Networks (GANs)
Two networks competing: generator and discriminator.
12.3 Transformers
Attention-based models for sequence processing.
13. Training Tips
- Weight initialization: Xavier/He initialization
- Learning rate scheduling: Decay over time
- Early stopping: Stop when validation loss increases
- Data augmentation: Increase training data diversity
14. Code Example
# Python implementation of a simple neural network
import numpy as np
class NeuralNetwork:
def __init__(self, layers):
"""
layers: list of layer sizes [input_size, hidden1, hidden2, ..., output_size]
"""
self.layers = layers
self.num_layers = len(layers)
# Initialize weights and biases
self.weights = []
self.biases = []
for i in range(1, self.num_layers):
# Xavier initialization
w = np.random.randn(layers[i], layers[i-1]) * np.sqrt(2.0 / layers[i-1])
b = np.zeros((layers[i], 1))
self.weights.append(w)
self.biases.append(b)
def sigmoid(self, z):
"""Sigmoid activation function"""
return 1 / (1 + np.exp(-np.clip(z, -500, 500)))
def sigmoid_derivative(self, z):
"""Derivative of sigmoid function"""
s = self.sigmoid(z)
return s * (1 - s)
def relu(self, z):
"""ReLU activation function"""
return np.maximum(0, z)
def relu_derivative(self, z):
"""Derivative of ReLU function"""
return (z > 0).astype(float)
def forward_propagation(self, X):
"""Forward pass through the network"""
activations = [X]
zs = []
for i in range(self.num_layers - 1):
z = np.dot(self.weights[i], activations[-1]) + self.biases[i]
zs.append(z)
if i < self.num_layers - 2: # Hidden layers
a = self.relu(z)
else: # Output layer
a = self.sigmoid(z)
activations.append(a)
return activations, zs
def backward_propagation(self, X, y, activations, zs):
"""Backward pass to compute gradients"""
m = X.shape[1] # Number of examples
# Initialize gradients
dW = [np.zeros(w.shape) for w in self.weights]
db = [np.zeros(b.shape) for b in self.biases]
# Output layer error
delta = activations[-1] - y
# Backpropagate the error
for i in range(self.num_layers - 2, -1, -1):
dW[i] = (1/m) * np.dot(delta, activations[i].T)
db[i] = (1/m) * np.sum(delta, axis=1, keepdims=True)
if i > 0: # Not the first layer
delta = np.dot(self.weights[i].T, delta) * self.relu_derivative(zs[i-1])
return dW, db
def update_parameters(self, dW, db, learning_rate):
"""Update weights and biases"""
for i in range(len(self.weights)):
self.weights[i] -= learning_rate * dW[i]
self.biases[i] -= learning_rate * db[i]
def compute_cost(self, y_pred, y_true):
"""Compute binary cross-entropy cost"""
m = y_true.shape[1]
cost = -(1/m) * np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
return cost
def train(self, X, y, epochs, learning_rate):
"""Train the neural network"""
costs = []
for epoch in range(epochs):
# Forward propagation
activations, zs = self.forward_propagation(X)
# Compute cost
cost = self.compute_cost(activations[-1], y)
costs.append(cost)
# Backward propagation
dW, db = self.backward_propagation(X, y, activations, zs)
# Update parameters
self.update_parameters(dW, db, learning_rate)
if epoch % 100 == 0:
print(f"Cost after epoch {epoch}: {cost}")
return costs
def predict(self, X):
"""Make predictions"""
activations, _ = self.forward_propagation(X)
return activations[-1] > 0.5
# Example usage
if __name__ == "__main__":
# Generate sample data (XOR problem)
X = np.array([[0, 0, 1, 1], [0, 1, 0, 1]])
y = np.array([[0, 1, 1, 0]])
# Create and train network
nn = NeuralNetwork([2, 4, 1])
costs = nn.train(X, y, epochs=1000, learning_rate=1.0)
# Make predictions
predictions = nn.predict(X)
print("Predictions:", predictions.astype(int))
print("Actual:", y.astype(int))
15. References
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning.
- Nielsen, M. A. (2015). Neural Networks and Deep Learning.
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning.