Multiclass Classification Neural Network
This example Neural network consists of four layers and will be used to demonstrate computations.
Layer A is the input layer.
Layer’s B and C are the hidden layers
Layer Ŷ is the output layer
Creating The Model
Our Neural Network can be modeled as a class with the following attributes and methods:
- List of layers (each layer represented as a numpy array)
- Architecture (specifies the number of nodes in each layer)
- A list of connections consisting of the weights, bias, and activation function needed to compute the value of each node.
- A forward propagation and back propagation method.
Forward propagation consists of computing the value of each node in the Neural Network. Starting at the input layer and moving towards the output layer. The value of each node can be represented as a linear combination transformed by some activation function (g).
Where g is a sigmoid activation function
In training a Neural Network the goal is to find the
values of the weights and biases that minimize some cost function J.
The optimal weights and biases can be found numerically via stochastic gradient descent; a method in which a random training example is used to update the weight and bias according to the following update rule.
The update matrix consists of the partial derivatives of the cost function (J) with respect to each weight in the connection.
see derivation of partial derivatives
Neural Networks: Computing Partial Derivatives
We will compute the partial derivative of the Cross-Entropy cost function
Computing the Weight Update Matrix and Bias Update Vector for our Example Neural Network can be done as follows:
Compute 𝛿’s for the output layer
The Update matrix for Connection 3 can be computed by applying the 𝛿’s vector of the output layer to the transpose of the layer to the left (Layer C). This is called the outer product of the two vectors.
The update rule for connection 3 is then
The 𝛿’s vector of Layer C can be computed from the 𝛿’s vector of the output layer and the transpose of the weights matrix of connection 3
The update rule for connection 2 is then
Compute 𝛿’s vector of Layer B
Update weights and biases in connection 1
Starting at the output Layer
- Compute Deltas
- From Deltas Compute update matrix
- Update Weights and Biases
- Move left 1 layer
Training The Model
Now that we have implemented the Forward Propagation and Back Propagation methods, we can now create our neural network and train the model via stochastic gradient descent.
- Load\Normalize Data & Create Model
2. Output Accuracy given with initial weights and biases:
3. Train model via Stochastic Gradient Descent (SGD)
Machine Learning: An Applied Mathematics Introduction by Paul Wilmott.