import time
import numpy as np
import h5py
import matplotlib.pyplot as plt
import scipy
from PIL import Image
from scipy import ndimage
%matplotlib inline
'figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
plt.rcParams[
1) np.random.seed(
Building a Cat Image Classifier using Neural Networks from Scratch
Repo link : https://github.com/SherryS997/cat-image-classification-neural-network-numpy-scipy
Introduction
The following Jupyter Notebook encapsulates the step-by-step creation of a cat image classifier using Neural Networks, entirely coded from scratch using Python libraries like NumPy, SciPy, and Matplotlib. This project aims to distinguish cat images from non-cat images using machine learning techniques implemented through a neural network architecture.
The notebook begins by loading the necessary libraries and datasets containing both training and test images of cats and non-cats. It further explores the dataset’s structure, dimensions, and the process of standardizing image data to be used in the neural network.
The construction of the neural network includes defining activation functions (sigmoid and ReLU), initializing parameters, implementing forward and backward propagation, and updating parameters via gradient descent. The model’s performance is assessed using various functions to compute the cost, make predictions, and visualize mislabeled images.
Library Imports and Environment Setup
This section initializes the notebook by importing essential libraries and configuring the environment. It includes importing standard libraries such as NumPy, SciPy, Matplotlib, and PIL (Python Imaging Library). Additionally, the section sets up the notebook environment by configuring parameters for Matplotlib plots, ensuring consistent visualization settings throughout the notebook. This step is crucial as it prepares the groundwork for subsequent data handling, model building, and result visualization within the notebook.
Data Loading and Exploration
Dataset Loading and Preprocessing
This section of the code focuses on loading the cat image dataset, which includes both training and test sets, from .h5 files. It reads the data and relevant labels, separating them into training features (train_x_orig
) and labels (train_y
), and test features (test_x_orig
) and labels (test_y
). Additionally, it retrieves the classes/categories from the dataset.
The data is loaded using the h5py
library and converted into NumPy arrays for further processing. Reshaping and organizing the label data ensure compatibility with subsequent operations. This step is crucial as it lays the foundation for subsequent data preprocessing and model development stages.
= h5py.File('datasets/train_catvnoncat.h5', "r")
train_dataset = np.array(train_dataset["train_set_x"][:]) # your train set features
train_x_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
train_y
= h5py.File('datasets/test_catvnoncat.h5', "r")
test_dataset = np.array(test_dataset["test_set_x"][:]) # your test set features
test_x_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
test_y
= np.array(test_dataset["list_classes"][:]) # the list of classes
classes
= train_y.reshape((1, train_y.shape[0]))
train_y = test_y.reshape((1, test_y.shape[0])) test_y
Visualizing Dataset Samples and Labels
This section presents a visualization of the dataset samples along with their respective labels. Using matplotlib, it displays a specific image from the training set identified by its index. Additionally, the label associated with the image is shown, providing clarity about the class it represents. In this instance, the output displays an example image labeled as a non-cat (y = 0), portraying a picture that corresponds to a hummingbird.
# Example of a picture
= 10
index
plt.imshow(train_x_orig[index])print ("y = " + str(train_y[0,index]) + ". It's a " + classes[train_y[0,index]].decode("utf-8") + " picture.")
y = 0. It's a non-cat picture.
Dataset Overview and Structure
This section provides an insight into the dataset statistics, presenting essential information such as the number of training and testing examples, along with details regarding the dimensions and size of each image. It displays the number of training and testing examples, specifying the dimensions and shape of the image arrays. This exploration helps understand the dataset’s structure and prepares for preprocessing and model development.
# Explore your dataset
= train_x_orig.shape[0]
m_train = train_x_orig.shape[1]
num_px = test_x_orig.shape[0]
m_test
print ("Number of training examples: " + str(m_train))
print ("Number of testing examples: " + str(m_test))
print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("train_x_orig shape: " + str(train_x_orig.shape))
print ("train_y shape: " + str(train_y.shape))
print ("test_x_orig shape: " + str(test_x_orig.shape))
print ("test_y shape: " + str(test_y.shape))
Number of training examples: 209
Number of testing examples: 50
Each image is of size: (64, 64, 3)
train_x_orig shape: (209, 64, 64, 3)
train_y shape: (1, 209)
test_x_orig shape: (50, 64, 64, 3)
test_y shape: (1, 50)
The output from the code snippet provides crucial insights into the dataset structure and composition:
Number of training examples: The dataset comprises 209 training examples, which serves as the data used for training the classification model.
Number of testing examples: There are 50 testing examples, utilized to evaluate the trained model’s performance on unseen data.
Image size: Each image in the dataset has a dimension of (64, 64, 3), indicating that the images are 64 pixels in height and width with three color channels (RGB).
Train and test set shapes: The training set, denoted by
train_x_orig
, consists of 209 images, each with a size of (64, 64, 3). The associated training labels (train_y
) have a shape of (1, 209). Similarly, the test set (test_x_orig
) comprises 50 images with the same dimensions as the training images, and the corresponding test labels (test_y
) possess a shape of (1, 50).
Understanding these dataset statistics is vital for various tasks, such as data preprocessing, designing the neural network architecture, and assessing the model’s performance during training and testing phases.
Data Preprocessing
Data Reshaping and Normalization
This section involves the preprocessing steps applied to the dataset before feeding it into the neural network model. It includes two key operations:
Reshaping: The code snippet reshapes the training and test examples. Using the
reshape()
function, the images are flattened to a 1D array while preserving the number of samples. This operation is crucial as it transforms the multi-dimensional image data into a format suitable for further processing.Normalization: Normalization is performed to standardize the pixel values of the images. The pixel values, initially ranging from 0 to 255, are scaled to be between 0 and 1 by dividing each pixel value by 255. Normalizing the data helps in achieving uniformity and stability during training, aiding the neural network to learn effectively without being skewed by varying pixel ranges. The standardized data is denoted as
train_x
andtest_x
.
These preprocessing steps are essential for preparing the data before training the neural network, ensuring that the model learns effectively and efficiently from the input images.
# Reshape the training and test examples
= train_x_orig.reshape(train_x_orig.shape[0], -1).T # The "-1" makes reshape flatten the remaining dimensions
train_x_flatten = test_x_orig.reshape(test_x_orig.shape[0], -1).T
test_x_flatten
# Standardize data to have feature values between 0 and 1.
= train_x_flatten/255.
train_x = test_x_flatten/255.
test_x
print ("train_x's shape: " + str(train_x.shape))
print ("test_x's shape: " + str(test_x.shape))
train_x's shape: (12288, 209)
test_x's shape: (12288, 50)
Neural Network Architecture
Neural Network Architecture Definition
Description: This section defines the architecture of the neural network based on specific constants and parameters. It establishes the structure of the model by setting the dimensions of each layer, including the input layer, hidden layers, and output layer. The constants n_x
, n_h
, and n_y
represent the sizes of the input features, hidden layers, and output respectively. Additionally, the layers_dims
list specifies the dimensions of each layer in a 4-layer neural network, providing insight into the overall structure of the model. The learning_rate
parameter, crucial for optimization, is also initialized here.
### CONSTANTS DEFINING THE MODEL ####
= num_px * num_px * 3
n_x = 7
n_h = 1
n_y = [12288, 20, 7, 5, 1] # 4-layer model
layers_dims = 0.0075 learning_rate
Deep Neural Network Parameter Initialization
The function “initialize_parameters_deep” sets up the initial values for the weights (W
) and biases (b
) of a deep neural network with multiple layers. It takes in an array (layer_dims
) containing the dimensions of each layer in the network.
This function initializes the parameters for each layer (W1
, b1
, …, WL
, bL
) as a Python dictionary (parameters
). For each layer l
, it generates random weights (Wl
) of shape (layer_dims[l], layer_dims[l-1])
using a Gaussian distribution and scales them by np.sqrt(layer_dims[l-1])
. The biases (bl
) are initialized as zero vectors of shape (layer_dims[l], 1)
.
The purpose of this initialization is to provide suitable starting values for the weights and biases, aiding in the convergence of the neural network during training. The appropriate initialization helps prevent issues like vanishing/exploding gradients and contributes to better learning in the subsequent training phases of the network.
def initialize_parameters_deep(layer_dims):
"""
Arguments:
layer_dims -- python array (list) containing the dimensions of each layer in our network
Returns:
parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
bl -- bias vector of shape (layer_dims[l], 1)
"""
1)
np.random.seed(= {}
parameters = len(layer_dims) # number of layers in the network
L
for l in range(1, L):
'W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) / np.sqrt(layer_dims[l-1]) #*0.01
parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))
parameters[
assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))
assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))
return parameters
Activation Functions Implementation (Sigmoid and ReLU)
This section in the code comprises the implementation of two fundamental activation functions used in neural networks: Sigmoid and Rectified Linear Unit (ReLU).
- Sigmoid Activation Function:
- Definition: The sigmoid activation function is implemented to squash the input values between 0 and 1, facilitating non-linear transformations in neural networks.
- Implementation: The sigmoid function takes in any numpy array
Z
and computes the outputA
using the formulaA = 1 / (1 + np.exp(-Z))
. It returns the computedA
and also caches the inputZ
for efficient backpropagation during the training process. - Backward Propagation: The
sigmoid_backward
function computes the gradient of the cost with respect toZ
during backpropagation. It utilizes the cachedZ
and the incoming gradientdA
to computedZ
, which is essential for updating weights in the network.
- ReLU (Rectified Linear Unit) Activation Function:
- Definition: ReLU is a widely used activation function that introduces non-linearity by setting all negative values to zero and leaving positive values unchanged.
- Implementation: The
relu
function takes the outputZ
from a linear layer and computes the post-activation outputA
usingA = np.maximum(0, Z)
. It also caches the inputZ
for efficient backward pass computation. - Backward Propagation: The
relu_backward
function calculates the gradient of the cost with respect toZ
for a single ReLU unit. It utilizes the cachedZ
and incoming gradientdA
to computedZ
. It ensures that forZ <= 0
, the gradientdZ
is set to zero to handle the non-differentiability atZ = 0
.
These activation functions, Sigmoid and ReLU, play pivotal roles in introducing non-linearities within neural networks, aiding in learning complex patterns and enhancing the network’s representational capacity during training.
def sigmoid(Z):
"""
Implements the sigmoid activation in numpy
Arguments:
Z -- numpy array of any shape
Returns:
A -- output of sigmoid(z), same shape as Z
cache -- returns Z as well, useful during backpropagation
"""
= 1/(1+np.exp(-Z))
A = Z
cache
return A, cache
def relu(Z):
"""
Implement the RELU function.
Arguments:
Z -- Output of the linear layer, of any shape
Returns:
A -- Post-activation parameter, of the same shape as Z
cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
"""
= np.maximum(0,Z)
A
assert(A.shape == Z.shape)
= Z
cache return A, cache
def relu_backward(dA, cache):
"""
Implement the backward propagation for a single RELU unit.
Arguments:
dA -- post-activation gradient, of any shape
cache -- 'Z' where we store for computing backward propagation efficiently
Returns:
dZ -- Gradient of the cost with respect to Z
"""
= cache
Z = np.array(dA, copy=True) # just converting dz to a correct object.
dZ
# When z <= 0, you should set dz to 0 as well.
<= 0] = 0
dZ[Z
assert (dZ.shape == Z.shape)
return dZ
def sigmoid_backward(dA, cache):
"""
Implement the backward propagation for a single SIGMOID unit.
Arguments:
dA -- post-activation gradient, of any shape
cache -- 'Z' where we store for computing backward propagation efficiently
Returns:
dZ -- Gradient of the cost with respect to Z
"""
= cache
Z
= 1/(1+np.exp(-Z))
s = dA * s * (1-s)
dZ
assert (dZ.shape == Z.shape)
return dZ
Forward Propagation
Forward Propagation Implementation for Neural Network Layers
The code snippet presents the implementation of forward propagation for a neural network’s layers, encompassing both linear transformation and activation functions. The following functions are integral components of this forward propagation process:
linear_forward
Function:- This function computes the linear part of a layer’s forward propagation.
- It calculates the pre-activation parameter
Z
using the weightsW
, activationsA
from the previous layer, and biasb
. - The resulting pre-activation parameter
Z
is fundamental for subsequent activation functions. - The calculated values are cached in a dictionary (
cache
) containingA
,W
, andb
for efficient computation during the backward pass.
def linear_forward(A, W, b):
"""
Implement the linear part of a layer's forward propagation.
Arguments:
A -- activations from previous layer (or input data): (size of previous layer, number of examples)
W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
b -- bias vector, numpy array of shape (size of the current layer, 1)
Returns:
Z -- the input of the activation function, also called pre-activation parameter
cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently
"""
= W.dot(A) + b
Z
assert(Z.shape == (W.shape[0], A.shape[1]))
= (A, W, b)
cache
return Z, cache
linear_activation_forward
Function:- This function implements the complete forward propagation for the layer, combining the linear transformation and an activation function (
sigmoid
orReLU
). - Depending on the specified activation function, it computes the post-activation value
A
using the pre-activationZ
obtained from thelinear_forward
step. - The resultant post-activation value
A
along with the linear and activation function caches (linear_cache
andactivation_cache
) are stored in a dictionary (cache
). These caches are crucial for efficient computation during the subsequent backward pass.
- This function implements the complete forward propagation for the layer, combining the linear transformation and an activation function (
Both functions work in tandem to compute the forward pass through a layer, incorporating linear transformations and different activation functions based on the specified requirements (sigmoid
or ReLU
). The caches obtained during this process facilitate the backward propagation for updating the neural network’s parameters during training.
def linear_activation_forward(A_prev, W, b, activation):
"""
Implement the forward propagation for the LINEAR->ACTIVATION layer
Arguments:
A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
b -- bias vector, numpy array of shape (size of the current layer, 1)
activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
Returns:
A -- the output of the activation function, also called the post-activation value
cache -- a python dictionary containing "linear_cache" and "activation_cache";
stored for computing the backward pass efficiently
"""
if activation == "sigmoid":
# Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
= linear_forward(A_prev, W, b)
Z, linear_cache = sigmoid(Z)
A, activation_cache
elif activation == "relu":
# Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
= linear_forward(A_prev, W, b)
Z, linear_cache = relu(Z)
A, activation_cache
assert (A.shape == (W.shape[0], A_prev.shape[1]))
= (linear_cache, activation_cache)
cache
return A, cache
Forward Propagation in L-layer Neural Networks
The function L_model_forward
orchestrates the forward propagation process within an L-layer neural network. It executes a sequence of [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computations. Here’s how it works:
- Initialization:
- Receives input data
X
and a set of pre-initialized parameters for each layer. - Initializes an empty list
caches
to store the cache values for later use.
- Receives input data
- Layer-wise Computation:
- Iterates through each layer (from 1 to L-1) except the output layer.
- Performs the [LINEAR -> ACTIVATION] step, using ReLU activation for hidden layers.
- Updates the current activation
A
and stores the computed cache for each layer incaches
.
- Output Layer Computation:
- Executes a final [LINEAR -> SIGMOID] step for the output layer to get the last post-activation value
AL
. - Appends this cache value to the
caches
list.
- Executes a final [LINEAR -> SIGMOID] step for the output layer to get the last post-activation value
- Validation and Return:
- Asserts the shape of the final post-activation value
AL
. - Returns
AL
, representing the predicted values or probabilities, andcaches
, containing the cached values from each layer’s computation.
- Asserts the shape of the final post-activation value
This function facilitates the sequential execution of the forward propagation steps, ensuring that each layer’s activation values are appropriately computed and stored for subsequent use during backward propagation.
def L_model_forward(X, parameters):
"""
Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation
Arguments:
X -- data, numpy array of shape (input size, number of examples)
parameters -- output of initialize_parameters_deep()
Returns:
AL -- last post-activation value
caches -- list of caches containing:
every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)
the cache of linear_sigmoid_forward() (there is one, indexed L-1)
"""
= []
caches = X
A = len(parameters) // 2 # number of layers in the neural network
L
# Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
for l in range(1, L):
= A
A_prev = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation = "relu")
A, cache
caches.append(cache)
# Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
= linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation = "sigmoid")
AL, cache
caches.append(cache)
assert(AL.shape == (1,X.shape[1]))
return AL, caches
Cost Computation
Cross-Entropy Cost Function Implementation
The function compute_cost(AL, Y)
calculates the cross-entropy cost, an essential metric in evaluating the performance of a classification neural network. The implementation follows the cross-entropy formula defined by equation (7). It takes in the predicted probability vector AL
and the true label vector Y
, both of shape (1, number of examples).
The steps involved in this implementation are:
Initialization: Obtain the number of examples
m
from the shape ofY
.Compute Loss: Calculate the cross-entropy loss by utilizing the formula:
\[ \text{cost} = \frac{1}{m} \times \left( - \sum_{i=1}^{m} \left( Y \cdot \log(AL)^T + (1 - Y) \cdot \log(1 - AL)^T \right) \right) \]
This equation computes the loss by comparing the predicted probability vector
AL
against the true label vectorY
. It sums the logarithmic loss for each example and averages it across all examples.Squeeze and Assert: To ensure the expected shape of the cost, the
np.squeeze()
function is applied, which transforms the shape to the expected scalar value. Additionally, an assertion check confirms that the shape of the cost matches the expected shape.Return: The computed cost value is returned to the calling function.
This cost function is crucial in the training process of the neural network, as it quantifies the dissimilarity between the predicted and actual labels, guiding the optimization process towards minimizing this dissimilarity during model training.
def compute_cost(AL, Y):
"""
Implement the cost function defined by equation (7).
Arguments:
AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)
Returns:
cost -- cross-entropy cost
"""
= Y.shape[1]
m
# Compute loss from aL and y.
= (1./m) * (-np.dot(Y,np.log(AL).T) - np.dot(1-Y, np.log(1-AL).T))
cost
= np.squeeze(cost) # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
cost assert(cost.shape == ())
return cost
Backward Propagation
Backward Propagation Derivation and Implementation for Neural Network Layers
This section focuses on the backward propagation process, a crucial step in training neural networks. It encompasses two main functions: linear_backward
and linear_activation_backward
. These functions form the backbone of the backpropagation algorithm for a single layer and a layer combined with an activation function, respectively.
linear_backward
function:- Computes the gradients of the cost function concerning the layer’s parameters (weights and bias) and the activation output of the previous layer.
- Utilizes the chain rule to compute gradients for the current layer’s weights, biases, and the activation output of the previous layer.
- Derives gradients using the computed
dZ
(gradient of the cost with respect to the linear output) and the values cached during the forward propagation step.
def linear_backward(dZ, cache):
"""
Implement the linear portion of backward propagation for a single layer (layer l)
Arguments:
dZ -- Gradient of the cost with respect to the linear output (of current layer l)
cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer
Returns:
dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
dW -- Gradient of the cost with respect to W (current layer l), same shape as W
db -- Gradient of the cost with respect to b (current layer l), same shape as b
"""
= cache
A_prev, W, b = A_prev.shape[1]
m
= 1./m * np.dot(dZ,A_prev.T)
dW = 1./m * np.sum(dZ, axis = 1, keepdims = True)
db = np.dot(W.T,dZ)
dA_prev
assert (dA_prev.shape == A_prev.shape)
assert (dW.shape == W.shape)
assert (db.shape == b.shape)
return dA_prev, dW, db
linear_activation_backward
function:- Implements the backward propagation for the combination of linear and activation layers.
- Allows for flexibility in choosing different activation functions (sigmoid or ReLU) while performing backward propagation.
- Combines the gradients calculated by the
linear_backward
function with the gradients derived from the activation function’s backward pass.
These functions collectively enable the computation of gradients at each layer of the neural network during backpropagation. Understanding this section is pivotal for comprehending how the model learns from its mistakes and adjusts its parameters to minimize the cost function.
def linear_activation_backward(dA, cache, activation):
"""
Implement the backward propagation for the LINEAR->ACTIVATION layer.
Arguments:
dA -- post-activation gradient for current layer l
cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
Returns:
dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
dW -- Gradient of the cost with respect to W (current layer l), same shape as W
db -- Gradient of the cost with respect to b (current layer l), same shape as b
"""
= cache
linear_cache, activation_cache
if activation == "relu":
= relu_backward(dA, activation_cache)
dZ = linear_backward(dZ, linear_cache)
dA_prev, dW, db
elif activation == "sigmoid":
= sigmoid_backward(dA, activation_cache)
dZ = linear_backward(dZ, linear_cache)
dA_prev, dW, db
return dA_prev, dW, db
Gradient Descent and Parameter Updates
The code provided contains functions crucial for implementing the backpropagation algorithm in a neural network model. Specifically, it focuses on calculating gradients for parameters and updating these parameters using the gradient descent optimization technique.
- L_model_backward() Function:
- This function executes the backward propagation for a neural network, specifically designed for the architecture of [LINEAR -> RELU] * (L-1) -> LINEAR -> SIGMOID. It takes in the final output probability vector (AL), the true label vector (Y), and a list of caches generated during the forward propagation.
- Initializes backpropagation by computing the derivative of the cost function with respect to the final output layer (dAL).
- Then iterates through the layers in reverse order, applying the appropriate backward activations (sigmoid or ReLU) to calculate gradients for the parameters (dW and db) and activation of the previous layer (dA).
- Finally, returns a dictionary containing the computed gradients.
- update_parameters() Function:
- This function implements the parameter update step using gradient descent.
- It takes the current set of parameters, the gradients calculated from the backward propagation, and the learning rate as inputs.
- Using a loop through the layers, it updates the weights (W) and biases (b) by subtracting the product of the learning rate and the corresponding gradients.
- Returns the updated parameters for the neural network.
These functions collectively form the core of the training process in a neural network. They compute the gradients of the cost function with respect to the parameters, enabling iterative updates through gradient descent to optimize the network’s parameters for improved performance during training.
def L_model_backward(AL, Y, caches):
"""
Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group
Arguments:
AL -- probability vector, output of the forward propagation (L_model_forward())
Y -- true "label" vector (containing 0 if non-cat, 1 if cat)
caches -- list of caches containing:
every cache of linear_activation_forward() with "relu" (there are (L-1) or them, indexes from 0 to L-2)
the cache of linear_activation_forward() with "sigmoid" (there is one, index L-1)
Returns:
grads -- A dictionary with the gradients
grads["dA" + str(l)] = ...
grads["dW" + str(l)] = ...
grads["db" + str(l)] = ...
"""
= {}
grads = len(caches) # the number of layers
L = AL.shape[1]
m = Y.reshape(AL.shape) # after this line, Y is the same shape as AL
Y
# Initializing the backpropagation
= - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
dAL
# Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "AL, Y, caches". Outputs: "grads["dAL"], grads["dWL"], grads["dbL"]
= caches[L-1]
current_cache "dA" + str(L-1)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, activation = "sigmoid")
grads[
for l in reversed(range(L-1)):
# lth layer: (RELU -> LINEAR) gradients.
= caches[l]
current_cache = linear_activation_backward(grads["dA" + str(l + 1)], current_cache, activation = "relu")
dA_prev_temp, dW_temp, db_temp "dA" + str(l)] = dA_prev_temp
grads["dW" + str(l + 1)] = dW_temp
grads["db" + str(l + 1)] = db_temp
grads[
return grads
def update_parameters(parameters, grads, learning_rate):
"""
Update parameters using gradient descent
Arguments:
parameters -- python dictionary containing your parameters
grads -- python dictionary containing your gradients, output of L_model_backward
Returns:
parameters -- python dictionary containing your updated parameters
parameters["W" + str(l)] = ...
parameters["b" + str(l)] = ...
"""
= len(parameters) // 2 # number of layers in the neural network
L
# Update rule for each parameter. Use a for loop.
for l in range(L):
"W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * grads["dW" + str(l+1)]
parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * grads["db" + str(l+1)]
parameters[
return parameters
Model Training
Training of Deep Neural Network Models
The function L_layer_model
implements the training process for both two-layer and L-layer neural network models. It employs a deep neural network architecture of the form: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID, allowing flexibility in defining the number of layers and their respective sizes.
Key Components:
Initialization: The function initializes the parameters of the neural network using a deep initialization method via
initialize_parameters_deep
.Iterations (Gradient Descent): It performs a specified number of iterations (controlled by
num_iterations
) to optimize the parameters. During each iteration:- Forward Propagation: The forward pass computes the activations (AL) and caches through the neural network layers using
L_model_forward
. - Cost Computation: The cost function is computed to evaluate the performance of the network through
compute_cost
. - Backward Propagation: The gradients are calculated using backpropagation via
L_model_backward
. - Parameter Update: The parameters are updated using the calculated gradients and a specified learning rate through
update_parameters
.
- Forward Propagation: The forward pass computes the activations (AL) and caches through the neural network layers using
Cost Tracking: It maintains a record of the cost after every 100 iterations (or the final iteration if specified) and appends it to the
costs
list.Print Cost (Optional): The parameter
print_cost
enables the display of the cost value at specified intervals (every 100 iterations) for tracking the convergence of the model during training.
Returns:
The function returns the optimized parameters learned during training (parameters
) and a list of costs (costs
) tracked over the iterations, which can be useful for visualizing the learning curve and assessing convergence.
Usage:
This function is versatile, allowing the training of deep neural networks by defining the number of layers (layers_dims
), input data (X
), true labels (Y
), learning rate (learning_rate
), and the number of optimization iterations (num_iterations
). The print_cost
flag controls whether the cost is printed during training iterations, providing flexibility for monitoring training progress.
def L_layer_model(X, Y, layers_dims, learning_rate=0.0075, num_iterations=3000, print_cost=False):
"""
Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID.
Arguments:
X -- data, numpy array of shape (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
layers_dims -- list containing the input size and each layer size, of length (number of layers + 1).
learning_rate -- learning rate of the gradient descent update rule
num_iterations -- number of iterations of the optimization loop
print_cost -- if True, it prints the cost every 100 steps
Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
"""
1)
np.random.seed(= [] # keep track of cost
costs
# Parameters initialization
= initialize_parameters_deep(layers_dims)
parameters
# Loop (gradient descent)
for i in range(num_iterations):
# Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID
= L_model_forward(X, parameters)
AL, caches
# Compute cost
= compute_cost(AL, Y)
cost
# Backward propagation
= L_model_backward(AL, Y, caches)
grads
# Update parameters
= update_parameters(parameters, grads, learning_rate)
parameters
# Print the cost every 100 iterations
if print_cost and (i % 100 == 0 or i == num_iterations - 1):
print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
if i % 100 == 0 or i == num_iterations - 1:
costs.append(cost)
return parameters, costs
Gradient Descent and Parameter Updates
This section involves the iterative process of optimizing the neural network through gradient descent and updating the model parameters. The L_layer_model
function, called with specified parameters like num_iterations = 2500
and print_cost = True
, encapsulates the core of this process.
Gradient Descent Iterations: The function initiates a loop for a specified number of iterations (
num_iterations = 2500
) where the model undergoes forward and backward propagation to compute the gradients and update the parameters.Cost Evaluation: During each iteration, the cost (or loss) is computed and printed if
print_cost
is set toTrue
. This allows monitoring the model’s learning progress over the iterations.Parameter Updates: Through backpropagation, gradients are computed for each layer, and the parameters (weights and biases) are updated using a learning rate and the calculated gradients to minimize the cost function.
This section represents the heart of training the neural network, where the model learns from the training data by adjusting its parameters iteratively to minimize the cost, eventually improving its predictive capability.
= L_layer_model(train_x, train_y, layers_dims, num_iterations = 2500, print_cost = True) parameters, costs
Cost after iteration 0: 0.7717493284237686
Cost after iteration 100: 0.6720534400822914
Cost after iteration 200: 0.6482632048575212
Cost after iteration 300: 0.6115068816101354
Cost after iteration 400: 0.5670473268366111
Cost after iteration 500: 0.5401376634547801
Cost after iteration 600: 0.5279299569455267
Cost after iteration 700: 0.4654773771766851
Cost after iteration 800: 0.36912585249592794
Cost after iteration 900: 0.39174697434805344
Cost after iteration 1000: 0.3151869888600617
Cost after iteration 1100: 0.2726998441789385
Cost after iteration 1200: 0.23741853400268134
Cost after iteration 1300: 0.19960120532208644
Cost after iteration 1400: 0.18926300388463305
Cost after iteration 1500: 0.16118854665827748
Cost after iteration 1600: 0.14821389662363316
Cost after iteration 1700: 0.13777487812972944
Cost after iteration 1800: 0.12974017549190123
Cost after iteration 1900: 0.12122535068005211
Cost after iteration 2000: 0.11382060668633713
Cost after iteration 2100: 0.10783928526254132
Cost after iteration 2200: 0.10285466069352679
Cost after iteration 2300: 0.10089745445261787
Cost after iteration 2400: 0.09287821526472397
Cost after iteration 2499: 0.088439943441702
Observations:
- Cost Decrease: The cost reduces progressively as the number of iterations increases. This reduction indicates that the model is learning and optimizing its parameters to better fit the training data.
- Convergence: Initially, the cost starts relatively high at
0.7717
and consistently decreases with each iteration. As the iterations progress, the rate of reduction in the cost diminishes, implying that the model is converging towards an optimal solution. - Stabilization: After around
2000
iterations, the cost reduction slows down considerably, with smaller decreases in subsequent iterations. This signifies that the model’s improvement becomes marginal, indicating it is approaching convergence or an optimal solution.
Overall, the output demonstrates the iterative nature of the training process, showing how the neural network learns and adjusts its parameters to minimize the cost function, thereby enhancing its predictive capability on the training data.
Model Evaluation
Model Performance Evaluation and Accuracy Assessment
This section evaluates the performance of the trained L-layer neural network model by assessing its prediction accuracy on both the training and test datasets. The function predict()
is utilized to make predictions on the provided datasets (train_x
and test_x
) using the trained model parameters.
The function performs the following steps: - Takes the dataset examples (X
) and corresponding labels (y
) as inputs along with the trained model parameters. - Utilizes forward propagation through the neural network to obtain predictions (p
) for the given dataset X
. - Converts the raw probabilities (probas
) into binary predictions (0 or 1) based on a threshold of 0.5. - Calculates and displays the accuracy of the predictions by comparing them with the true labels (y
).
def predict(X, y, parameters):
"""
This function is used to predict the results of a L-layer neural network.
Arguments:
X -- data set of examples you would like to label
parameters -- parameters of the trained model
Returns:
p -- predictions for the given dataset X
"""
= X.shape[1]
m = len(parameters) // 2 # number of layers in the neural network
n = np.zeros((1,m))
p
# Forward propagation
= L_model_forward(X, parameters)
probas, caches
# convert probas to 0/1 predictions
for i in range(0, probas.shape[1]):
if probas[0,i] > 0.5:
0,i] = 1
p[else:
0,i] = 0
p[
print("Accuracy: " + str(np.sum((p == y)/m)))
return p
= predict(train_x, train_y, parameters) pred_train
Accuracy: 0.9856459330143539
= predict(test_x, test_y, parameters) pred_test
Accuracy: 0.8
The reported accuracy for the training dataset is approximately 98.56%, while the accuracy for the test dataset stands at 80.0%. Assessing the accuracy on both datasets provides insights into the model’s performance on seen and unseen data, indicating its capability to generalize beyond the training set.
Results Visualization
Cost Evolution During Training Iterations
The visualization depicts the evolution of the cost or loss function over training iterations. It is generated using the plot_costs
function, which takes in the costs array as input. The y-axis represents the cost value, while the x-axis shows the iterations in multiples of hundreds.
The plot demonstrates the decreasing trend of the cost function with increasing iterations, indicating the model’s learning progress. It’s evident from the displayed cost values after each iteration that the cost gradually decreases. Initially, the cost is relatively high, reflecting the model’s higher error rate, but it decreases consistently over training epochs. As the iterations progress, the cost approaches a lower value, signifying the model’s improvement in minimizing errors and getting closer to optimal parameters.
The plotted curve illustrates how the cost decreases over time, providing insight into the model’s learning behavior and convergence towards better performance.
def plot_costs(costs, learning_rate=0.0075):
plt.plot(np.squeeze(costs))'cost')
plt.ylabel('iterations (per hundreds)')
plt.xlabel("Learning rate =" + str(learning_rate))
plt.title(
plt.show()
plot_costs(costs, learning_rate)
Visualizing Misclassified Images
This section utilizes a function named print_mislabeled_images()
to visually represent misclassified images. The function takes parameters including the class labels, dataset (X
), true labels (y
), and predicted labels (p
). It aims to plot and display images where the predictions made by the model differ from the true labels.
The code identifies and extracts misclassified indices, highlighting cases where the sum of predicted and true labels is equal to 1. It then generates visualizations for these misclassified images by iterating through the identified indices. The plot showcases the image, its predicted class label, and the actual class label.
In this specific instance, the function displays 10 misclassified images, enabling an insightful view into instances where the model’s predictions did not align with the ground truth labels. The visualization aids in understanding the nature of misclassifications and potentially identifying patterns or challenges within the model’s performance.
def print_mislabeled_images(classes, X, y, p):
"""
Plots images where predictions and truth were different.
X -- dataset
y -- true labels
p -- predictions
"""
= p + y
a = np.asarray(np.where(a == 1))
mislabeled_indices 'figure.figsize'] = (40.0, 40.0) # set default size of plots
plt.rcParams[= len(mislabeled_indices[0])
num_images for i in range(num_images):
= mislabeled_indices[1][i]
index
2, num_images, i + 1)
plt.subplot(64,64,3), interpolation='nearest')
plt.imshow(X[:,index].reshape('off')
plt.axis("Prediction: " + classes[int(p[0,index])].decode("utf-8") + " \n Class: " + classes[y[0,index]].decode("utf-8"))
plt.title(
print_mislabeled_images(classes, test_x, test_y, pred_test)
Conclusion: Assessing the Cat Image Classification Neural Network
The development of the cat image classification neural network showcases the application of deep learning techniques for image recognition tasks. Throughout this project, several key aspects have been addressed:
Model Architecture and Training: The notebook implemented an L-layer neural network from scratch using numpy and scipy. The architecture included linear and activation functions (ReLU and sigmoid) and underwent iterative training to optimize parameters for better accuracy.
Performance Evaluation: The trained models were evaluated on both the training and test sets. The assessment included metrics like accuracy, cost, and visual representation of incorrectly classified images. These evaluations provided insights into the model’s ability to distinguish cat images from non-cat images.
Key Findings: The models exhibited varying levels of performance. The L-layer network demonstrated improved accuracy and better representation of features.
Challenges and Future Directions: Despite the successful implementation, challenges like overfitting, optimization convergence, and computational intensity were encountered. To further enhance the model’s performance, regularization techniques, hyperparameter tuning, and exploring more complex architectures could be considered.
Application and Relevance: The cat image classification neural network exemplifies the practical application of machine learning in real-world scenarios, specifically in image recognition tasks. The skills acquired and lessons learned from this project lay a solid foundation for tackling more complex image classification problems.
Final Thoughts: Building a neural network from scratch not only provided a deep understanding of its inner workings but also emphasized the significance of data preprocessing, model architecture, and iterative optimization in achieving robust and accurate predictions.
In essence, this project not only successfully classified cat images but also served as a stepping stone towards understanding and refining neural networks for diverse image recognition applications.