Understanding Back Propagation in Neural Networks
Understanding Back Propagation in Neural Networks
Back propagation in neural networks is a fundamental process used for training and optimizing the model by minimizing the error between the predicted output and the actual output. This is achieved by adjusting the weights and biases of the network based on the computed gradients of the loss function with respect to these parameters.
In this blog post, we will:
- Explain the concept of back propagation in the context of neural networks.
- Discuss the mathematical operations involved in this process.
- Provide examples to illustrate how back propagation works in a neural network.
Concept of Back Propagation
Back propagation is a supervised learning algorithm used for training neural networks. It involves two main phases: forward propagation and backward propagation. In forward propagation, the input data is passed through the network to obtain the output. In backward propagation, the error is propagated backward through the network to update the weights and biases.
The main goal of back propagation is to minimize the loss function, which measures the difference between the predicted output and the actual output. This is done by computing the gradient of the loss function with respect to each weight and bias, and then updating these parameters using gradient descent.
Mathematical Operations
The core of back propagation involves the following mathematical operations:
- Compute the Loss: The loss function quantifies the difference between the predicted output and the actual output. A common loss function for classification problems is the cross-entropy loss, defined as:
L = -[y * log(y_hat) + (1 - y) * log(1 - y_hat)]
where
y
is the actual output, andy_hat
is the predicted output. The cross-entropy loss for multiple samples can be written as:L = - (1/N) * Σ [y_i * log(y_hat_i) + (1 - y_i) * log(1 - y_hat_i)]
where
N
is the number of samples. - Compute the Gradients: The gradient of the loss function with respect to each weight and bias is computed using the chain rule. For a single neuron, the gradient of the loss with respect to the weight
w
is:dL/dw = (dL/dy_hat) * (dy_hat/dz) * (dz/dw)
where
z = w * x + b
is the linear transformation, andy_hat = activation(z)
is the activation function output.For a neuron with multiple inputs, the gradient computation extends to:
dL/dw_j = (dL/dy_hat) * (dy_hat/dz) * x_j
where
w_j
andx_j
are the weights and inputs associated with thej
-th input to the neuron. - Update the Weights and Biases: The weights and biases are updated using gradient descent:
w = w - η * (dL/dw)
b = b - η * (dL/db)
where
η
(learning rate) is a hyperparameter that controls the step size of the updates.
Back Propagation in a Neural Network
Consider a simple neural network with one hidden layer. The back propagation process involves the following steps:
- Compute the loss at the output layer.
- Compute the gradient of the loss with respect to the output layer weights and biases.
- Propagate the error backward to the hidden layer.
- Compute the gradient of the loss with respect to the hidden layer weights and biases.
- Update the weights and biases using gradient descent.
Detailed Mathematical Derivation
Let's break down the back propagation process for a neural network with one hidden layer in more detail:
Forward Propagation Equations
For the hidden layer:
z1 = W1 * X + b1
a1 = σ(z1)
For the output layer:
z2 = W2 * a1 + b2
a2 = σ(z2)
where σ
represents the activation function (e.g., sigmoid).
Backward Propagation Equations
Compute the error at the output layer:
δ2 = a2 - y
Compute the gradient with respect to the weights and biases of the output layer:
dL/dW2 = δ2 * a1^T
dL/db2 = δ2
Propagate the error to the hidden layer:
δ1 = (W2^T) * δ2 * σ'(z1)
Compute the gradient with respect to the weights and biases of the hidden layer:
dL/dW1 = δ1 * X^T
dL/db1 = δ1
Example with Python Code
Let's implement back propagation in Python using NumPy. We will define a small neural network and perform back propagation to update the weights and biases:
import numpy as np
# Define activation function and its derivative
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x)
# Define the loss function and its derivative
def cross_entropy_loss(y, y_hat):
return -np.mean(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))
def cross_entropy_loss_derivative(y, y_hat):
return y_hat - y
# Forward propagation function
def forward_propagation(X, weights, biases):
z1 = np.dot(weights['w1'], X) + biases['b1']
a1 = sigmoid(z1)
z2 = np.dot(weights['w2'], a1) + biases['b2']
output = sigmoid(z2)
return z1, a1, z2, output
# Backward propagation function
def backward_propagation(X, y, weights, biases, z1, a1, z2, output, learning_rate):
# Compute the loss derivative
dL_dy_hat = cross_entropy_loss_derivative(y, output)
# Compute the gradient at the output layer
dz2 = dL_dy_hat * sigmoid_derivative(output)
dw2 = np.dot(dz2, a1.T)
db2 = np.sum(dz2, axis=1, keepdims=True)
# Compute the gradient at the hidden layer
dz1 = np.dot(weights['w2'].T, dz2) * sigmoid_derivative(a1)
dw1 = np.dot(dz1, X.T)
db1 = np.sum(dz1, axis=1, keepdims=True)
# Update the weights and biases
weights['w2'] -= learning_rate * dw2
biases['b2'] -= learning_rate * db2
weights['w1'] -= learning_rate * dw1
biases['b1'] -= learning_rate * db1
# Example inputs
X = np.array([[0.5], [0.1]])
y = np.array([[1]])
weights = {
'w1': np.array([[0.2, 0.8], [0.5, 0.1]]),
'w2': np.array([[0.4, 0.2]])
}
biases = {
'b1': np.array([[0.1], [0.2]]),
'b2': np.array([[0.3]])
}
learning_rate = 0.1
# Perform forward propagation
z1, a1, z2, output = forward_propagation(X, weights, biases)
# Compute the initial loss
initial_loss = cross_entropy_loss(y, output)
print("Initial Loss:", initial_loss)
# Perform backward propagation
backward_propagation(X, y, weights, biases, z1, a1, z2, output, learning_rate)
# Perform forward propagation again to check the updated loss
_, _, _, updated_output = forward_propagation(X, weights, biases)
updated_loss = cross_entropy_loss(y, updated_output)
print("Updated Loss:", updated_loss)
In this example, we defined a neural network with an input layer, one hidden layer, and an output layer. The forward propagation function computes the output of the network, and the backward propagation function updates the weights and biases based on the computed gradients.
Interpretation of the Updated Parameters
After performing back propagation, the weights and biases are updated to minimize the loss function. This process iteratively adjusts the parameters to improve the network's performance. The decrease in loss after each update indicates that the network is learning and getting closer to the desired output.
Conclusion
Back propagation is a critical process in training neural networks, enabling the network to learn from the data by minimizing the error between the predicted and actual outputs. Understanding the mathematical operations and implementing them in code is essential for building and training neural networks effectively. This guide provides a comprehensive overview of back propagation, from the basic concepts to the implementation in Python, giving you a solid foundation to delve deeper into neural network training and optimization.
Comments
Post a Comment