Monday, July 24, 2017

Neural Network Ramp Up Notes

This post will list a number of resources related to neural networks and provide a summary of important points.  Please consult the source articles to learn more.  The notes below are merely a means for study.  In fact, entire sentences are lifted from the source material.

A Quick Introduction to Neural Networks

An artificial neural network is a computational model that is inspired by biological neural networks.
The basic unit of computation in a neural network is the neuron (or node, unit). It receives input from other notes and computes an output. Each input has an associated weight (w) which is assigned on the basis of its relative important to the other inputs. The node applies a function (f) to the weighted sum of its inputs. The function f is non linear and is called the activation function. There is also another input 1 with weight b (called the Bias).

Output of neuron Y=f(w1.X1 +w2.X2 + b)

Every action function takes a single number and performs a certain fixed mathematical operation on it. There are several action functions that are encountered in practice: sigmoid, tanh, ReLU.

The main function of Bias is to provide every node with a a trainable constant value (in addition to the normal inputs that the node receives).

A feedforward neural network is the simplest. The network can consist of three types of nodes: input, hidden, and output nodes. Data flows in only one direction. There are no cycles or loops. A Multi Layer Perceptron (MLP) contains one or more hidden layers. An MLP can learn the relationships between the features (x1, x2) and target y.

Example: Hours studied to mid term marks. This is known as a binary classification problem.

The process by which a MLP learns is called the Backpropagation (BackProp) algorithm. BackProp is like "learning from mistakes. The supervisor corrects the ANN whenever it makes a mistake.
In supervised learning, the training set is labeled. This means, for some given inputs, we know the desired/expected output (label).

For BackProp, initially all the edge weights are randomly assigned. For every input in the training dataset, the ANN is activated and its output is observed. The output is compared with the desired output that we already know, and the error is "propagated" back to the previous layer. This error is notes and the weights are "adjusted" accordingly. This process is repeated until the output error is below a predetermined threshold.

Once the above algorithm terminates, we have a "learned" ANN.

See: for more information on the back propagation algorithm.

Using CNNs To Speed Up Systems

Convolutional neural networks (CNNs) are becoming one of the key differentiators in system performance.  The most important power impact from the CNNs is for the ADAS (advanced driver assistance system) application.  Power is critical factor for CNNs. The key part of a CNN is matrix multiplication.  So far, it is not clear what is the best platform for this (SIMD, FPGA, DSPs, GPU). There are significant power and performance tradeoffs associated with CNNs.

At its most basic, CNNs are a combination of MAC operations and memory handling.  "You need to be able to get the data in and out quickly so that you don't stave your MAC".

Is there a difference between NN and CNNs?

NN is a generic name for a large class of machine learning algorithms.  Most of the algorithms are trained with back propagation.

In late 1980s, early 1990s, the dominating algorithm in neural nets (and machine learning in general) was fully connected neural networks.  These algorithms have a large number of parameters and do not scale well.  Then comes CNNs which is not a fully connected network and where neurons share weights.  These types of neural nets have been proven successfully especially in the fields of computer vision and natural language recognition.

Neural Networks and Deep Learning Book

Chapter 1

Using neural nets to recognize handwritten digits.  Idea is to take large number of handwritten digits, known as training examples and develop a system that can learn from those training examples.

Perceptron takes binary inputs and produces single binary output.  The inputs are weighted.  Dot product of weights and inputs, then use as input to activation/non-linear function (this is sigmoid?).  Example:  Go to cheese festival?  What factors: x1 is weather good, x2 does girlfriend want to go, etc.  Weights applies to each decision factor.

For handwriting recognition problem.  28x28 pixel = 784 input neurons.  Output layer is 0-9 indicating the result.

(I am omitting all the math in this chapter)

Here's how the program presented in the chapter works:

  • Loads MNIST training data set
  • Set up neural network: 784 input neurons, 30 in second layer, 10 at output
  • Use stochastic gradient descent to learn from MNIST training data over 30 epocs, mini-batch size of 10, and learning rate of 3.0
    • stochastic gradient descent is the standard learning algorithm for neural networks
    • mini-batch random subset of training data?
    • Once we exhausted all training inputs we've completed an epoch of training
    • Increasing learning rate to improve results

Deep learning means more layers.

Chapter 2

Backprop was originally introduced in 1970s.  Mathematically intensive.  Backprop is the workhorse of learning in neural networks.  At heart of backprop is an expression for the partial derivate of the cost function with respect to any weight of the network.  The expression tells us how quickly the cost changes when we change the weights and biases.  It gives insights into how changing the weights and biases changes the overall behavior of the network.

No comments:

Post a Comment