Understanding Back Propagation in Neural Networks | Essential Guide

Understanding Back Propagation in Neural Networks

Back propagation in neural networks is a fundamental process used for training and optimizing the model by minimizing the error between the predicted output and the actual output. This is achieved by adjusting the weights and biases of the network based on the computed gradients of the loss function with respect to these parameters.

In this blog post, we will:

Explain the concept of back propagation in the context of neural networks.
Discuss the mathematical operations involved in this process.
Provide examples to illustrate how back propagation works in a neural network.

Concept of Back Propagation

Back propagation is a supervised learning algorithm used for training neural networks. It involves two main phases: forward propagation and backward propagation. In forward propagation, the input data is passed through the network to obtain the output. In backward propagation, the error is propagated backward through the network to update the weights and biases.

The main goal of back propagation is to minimize the loss function, which measures the difference between the predicted output and the actual output. This is done by computing the gradient of the loss function with respect to each weight and bias, and then updating these parameters using gradient descent.

Mathematical Operations

The core of back propagation involves the following mathematical operations:

Compute the Loss: The loss function quantifies the difference between the predicted output and the actual output. A common loss function for classification problems is the cross-entropy loss, defined as:

L = -[y * log(y_hat) + (1 - y) * log(1 - y_hat)]

where y is the actual output, and y_hat is the predicted output. The cross-entropy loss for multiple samples can be written as:

L = - (1/N) * Σ [y_i * log(y_hat_i) + (1 - y_i) * log(1 - y_hat_i)]

where N is the number of samples.
Compute the Gradients: The gradient of the loss function with respect to each weight and bias is computed using the chain rule. For a single neuron, the gradient of the loss with respect to the weight w is:

dL/dw = (dL/dy_hat) * (dy_hat/dz) * (dz/dw)

where z = w * x + b is the linear transformation, and y_hat = activation(z) is the activation function output.

For a neuron with multiple inputs, the gradient computation extends to:

dL/dw_j = (dL/dy_hat) * (dy_hat/dz) * x_j

where w_j and x_j are the weights and inputs associated with the j-th input to the neuron.
Update the Weights and Biases: The weights and biases are updated using gradient descent:

w = w - η * (dL/dw)

b = b - η * (dL/db)

where η (learning rate) is a hyperparameter that controls the step size of the updates.

Back Propagation in a Neural Network

Consider a simple neural network with one hidden layer. The back propagation process involves the following steps:

Compute the loss at the output layer.
Compute the gradient of the loss with respect to the output layer weights and biases.
Propagate the error backward to the hidden layer.
Compute the gradient of the loss with respect to the hidden layer weights and biases.
Update the weights and biases using gradient descent.

Detailed Mathematical Derivation

Let's break down the back propagation process for a neural network with one hidden layer in more detail:

Forward Propagation Equations

For the hidden layer:

z1 = W1 * X + b1

a1 = σ(z1)

For the output layer:

z2 = W2 * a1 + b2

a2 = σ(z2)

where σ represents the activation function (e.g., sigmoid).

Backward Propagation Equations

Compute the error at the output layer:

δ2 = a2 - y

Compute the gradient with respect to the weights and biases of the output layer:

dL/dW2 = δ2 * a1^T

dL/db2 = δ2

Propagate the error to the hidden layer:

δ1 = (W2^T) * δ2 * σ'(z1)

Compute the gradient with respect to the weights and biases of the hidden layer:

dL/dW1 = δ1 * X^T

dL/db1 = δ1

Example with Python Code

Let's implement back propagation in Python using NumPy. We will define a small neural network and perform back propagation to update the weights and biases:

import numpy as np

# Define activation function and its derivative
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

# Define the loss function and its derivative
def cross_entropy_loss(y, y_hat):
    return -np.mean(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))

def cross_entropy_loss_derivative(y, y_hat):
    return y_hat - y

# Forward propagation function
def forward_propagation(X, weights, biases):
    z1 = np.dot(weights['w1'], X) + biases['b1']
    a1 = sigmoid(z1)
    z2 = np.dot(weights['w2'], a1) + biases['b2']
    output = sigmoid(z2)
    return z1, a1, z2, output

# Backward propagation function
def backward_propagation(X, y, weights, biases, z1, a1, z2, output, learning_rate):
    # Compute the loss derivative
    dL_dy_hat = cross_entropy_loss_derivative(y, output)
    
    # Compute the gradient at the output layer
    dz2 = dL_dy_hat * sigmoid_derivative(output)
    dw2 = np.dot(dz2, a1.T)
    db2 = np.sum(dz2, axis=1, keepdims=True)
    
    # Compute the gradient at the hidden layer
    dz1 = np.dot(weights['w2'].T, dz2) * sigmoid_derivative(a1)
    dw1 = np.dot(dz1, X.T)
    db1 = np.sum(dz1, axis=1, keepdims=True)
    
    # Update the weights and biases
    weights['w2'] -= learning_rate * dw2
    biases['b2'] -= learning_rate * db2
    weights['w1'] -= learning_rate * dw1
    biases['b1'] -= learning_rate * db1

# Example inputs
X = np.array([[0.5], [0.1]])
y = np.array([[1]])
weights = {
    'w1': np.array([[0.2, 0.8], [0.5, 0.1]]),
    'w2': np.array([[0.4, 0.2]])
}
biases = {
    'b1': np.array([[0.1], [0.2]]),
    'b2': np.array([[0.3]])
}
learning_rate = 0.1

# Perform forward propagation
z1, a1, z2, output = forward_propagation(X, weights, biases)

# Compute the initial loss
initial_loss = cross_entropy_loss(y, output)
print("Initial Loss:", initial_loss)

# Perform backward propagation
backward_propagation(X, y, weights, biases, z1, a1, z2, output, learning_rate)

# Perform forward propagation again to check the updated loss
_, _, _, updated_output = forward_propagation(X, weights, biases)
updated_loss = cross_entropy_loss(y, updated_output)
print("Updated Loss:", updated_loss)

In this example, we defined a neural network with an input layer, one hidden layer, and an output layer. The forward propagation function computes the output of the network, and the backward propagation function updates the weights and biases based on the computed gradients.

Interpretation of the Updated Parameters

After performing back propagation, the weights and biases are updated to minimize the loss function. This process iteratively adjusts the parameters to improve the network's performance. The decrease in loss after each update indicates that the network is learning and getting closer to the desired output.

Conclusion

Back propagation is a critical process in training neural networks, enabling the network to learn from the data by minimizing the error between the predicted and actual outputs. Understanding the mathematical operations and implementing them in code is essential for building and training neural networks effectively. This guide provides a comprehensive overview of back propagation, from the basic concepts to the implementation in Python, giving you a solid foundation to delve deeper into neural network training and optimization.

Search This Blog

Programming for beginners