Forward Propagation
The forward pass in a neural network involves processing input data through the network’s layers to generate predictions or outputs by sequentially applying weights, biases, and activation functions.
Mathematical Representation:
For one training example \(x^{(i)}\):
- Hidden Layer Computations:
- Weighted Sum Calculation:
- \(z^{[l] (i)} = W^{[l]} a^{[l-1](i)} + b^{[l]}\)
- Compute the weighted sum of inputs (\(a^{[l-1](i)}\)) by the weights (\(W^{[l]}\)) and add the bias (\(b^{[l]}\)) for layer \(l\).
- \(z^{[l] (i)} = W^{[l]} a^{[l-1](i)} + b^{[l]}\)
- Activation Calculation:
- \(a^{[l] (i)} = g(z^{[l] (i)})\)
- Apply an activation function \(g\) (such as sigmoid, tanh, ReLU, etc.) to the computed weighted sum to get the activation of layer \(l\).
- \(a^{[l] (i)} = g(z^{[l] (i)})\)
- Weighted Sum Calculation:
- Output Layer Computations:
- Weighted Sum Calculation:
- \(z^{[L] (i)} = W^{[L]} a^{[L-1] (i)} + b^{[L]}\)
- Compute the weighted sum of inputs (\(a^{[L-1] (i)}\)) by the weights (\(W^{[L]}\)) and add the bias (\(b^{[L]}\)) for the output layer.
- \(z^{[L] (i)} = W^{[L]} a^{[L-1] (i)} + b^{[L]}\)
- Activation Calculation:
- \(\hat{y}^{(i)} = a^{[L] (i)} = \sigma(z^{ [L] (i)})\)
- Apply an appropriate activation function (e.g., softmax for multiclass classification, sigmoid for binary classification) to the computed weighted sum to obtain the predicted output (\(\hat{y}^{(i)}\)).
- \(\hat{y}^{(i)} = a^{[L] (i)} = \sigma(z^{ [L] (i)})\)
- Weighted Sum Calculation:
- Cost Function:
- \(J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \left(y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L] (i)}\right)\right)\)
- Compute the appropriate cost function (e.g., cross-entropy, mean squared error) to evaluate the difference between the predicted output (\(a^{[L] (i)}\)) and the actual output (\(y^{(i)}\)) for all training examples (\(m\)).
- \(J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \left(y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L] (i)}\right)\right)\)
Code
import numpy as np
def forward_propagation(X, parameters):
"""
Argument:
X -- input data of size (n_x, m)
parameters -- python dictionary containing your parameters (output of initialization function)
Returns:
A2 -- The sigmoid output of the second activation
cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
"""
= parameters["W1"], parameters["b1"], parameters["W2"], parameters["b2"]
W1, b1, W2, b2
= W1@X + b1
Z1 = np.tanh(Z1)
A1 = W2@A1 + b2
Z2 = 1/(1+np.exp(-Z2))
A2
= {"Z1": Z1,
cache "A1": A1,
"Z2": Z2,
"A2": A2}
return A2, cache