Multiclass Classification Neural Network

Olaniyi O

4 min readMay 4, 2021

Neural Network

This example Neural network consists of four layers and will be used to demonstrate computations.

Layer A is the input layer.

Layer’s B and C are the hidden layers

Layer Ŷ is the output layer

Creating The Model

Our Neural Network can be modeled as a class with the following attributes and methods:

List of layers (each layer represented as a numpy array)
Architecture (specifies the number of nodes in each layer)
A list of connections consisting of the weights, bias, and activation function needed to compute the value of each node.
A forward propagation and back propagation method.

Forward Propagation

Forward propagation consists of computing the value of each node in the Neural Network. Starting at the input layer and moving towards the output layer. The value of each node can be represented as a linear combination transformed by some activation function (g).

Where g is a sigmoid activation function

Backpropagation

In training a Neural Network the goal is to find the
values of the weights and biases that minimize some cost function J.

Cross Entropy Cost Function

The optimal weights and biases can be found numerically via stochastic gradient descent; a method in which a random training example is used to update the weight and bias according to the following update rule.

The update matrix consists of the partial derivatives of the cost function (J) with respect to each weight in the connection.

see derivation of partial derivatives

Neural Networks: Computing Partial Derivatives

We will compute the partial derivative of the Cross-Entropy cost function

olasehinde12.medium.com

Computing the Weight Update Matrix and Bias Update Vector for our Example Neural Network can be done as follows:

Compute 𝛿’s for the output layer

The Update matrix for Connection 3 can be computed by applying the 𝛿’s vector of the output layer to the transpose of the layer to the left (Layer C). This is called the outer product of the two vectors.

The update rule for connection 3 is then

The 𝛿’s vector of Layer C can be computed from the 𝛿’s vector of the output layer and the transpose of the weights matrix of connection 3

The update rule for connection 2 is then

Compute 𝛿’s vector of Layer B

Update weights and biases in connection 1

Backpropagation Algorithm:

Starting at the output Layer

Compute Deltas
From Deltas Compute update matrix
Update Weights and Biases
Move left 1 layer

Training The Model

Now that we have implemented the Forward Propagation and Back Propagation methods, we can now create our neural network and train the model via stochastic gradient descent.

Load\Normalize Data & Create Model

2. Output Accuracy given with initial weights and biases:

3. Train model via Stochastic Gradient Descent (SGD)