Edmond Cote's Blog: July 2017

Thursday, July 27, 2017

TensorFlow Ramp Up Notes

Quick post. This was written some time ago but cleaned up this morning. Similar in essence to a previous entry on neural nets. http://blog.edmondcote.com/2017/07/my-neural-network-ramp-up-notes.html

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

http://download.tensorflow.org/paper/whitepaper2015.pdf

TensorFlow computation is described by directed graph that represents a dataflow computation, with extensions for maintaining/updating persistent state and for branching and looping control. Each node has 0 or mode inputs and 0 or more outputs, represents and instance of an operation. Values that flow along normal edges are called Tensors. Special edges called control dependencies can also exist. No data flows on such edges.

An operation has a name and represents an abstract computation (matrix mult. or add). An operation can have attributes that are provided at graph-construction time. A kernel is a implementation of an operation that can be run on a particular type of device.

The main component of a TensorFlow system are the client, which communicates with the master, and one or more worker processes. Each worker process is responsible to arbitrating access to 1 or more devices (GPU,CPU,etc.). The worker process executes a sub graph. Communication between nodes is achieved using send/receive primitives (RDMA, TCP).

Carefully scheduling of TF operations can result in better performance of the system, specifically with response to data transfers or memory usage (in GPU, memory is scarce).

Pre-existing highly-optimized numerical libraries (BLAS, cuBLAS, etc.) are used to implement kernels for some operations.

Some ML algorithms, including those typically used for training neural networks, are tolerant of noise and reduced precision arithmetic.

Monday, July 24, 2017

Neural Network Ramp Up Notes

This post will list a number of resources related to neural networks and provide a summary of important points. Please consult the source articles to learn more. The notes below are merely a means for study. In fact, entire sentences are lifted from the source material.

A Quick Introduction to Neural Networks

https://ujjwalkarn.me/2016/08/09/quick-intro-neural-networks/

An artificial neural network is a computational model that is inspired by biological neural networks.
The basic unit of computation in a neural network is the neuron (or node, unit). It receives input from other notes and computes an output. Each input has an associated weight (w) which is assigned on the basis of its relative important to the other inputs. The node applies a function (f) to the weighted sum of its inputs. The function f is non linear and is called the activation function. There is also another input 1 with weight b (called the Bias).

Output of neuron Y=f(w1.X1 +w2.X2 + b)

Every action function takes a single number and performs a certain fixed mathematical operation on it. There are several action functions that are encountered in practice: sigmoid, tanh, ReLU.

The main function of Bias is to provide every node with a a trainable constant value (in addition to the normal inputs that the node receives).

A feedforward neural network is the simplest. The network can consist of three types of nodes: input, hidden, and output nodes. Data flows in only one direction. There are no cycles or loops. A Multi Layer Perceptron (MLP) contains one or more hidden layers. An MLP can learn the relationships between the features (x1, x2) and target y.

Example: Hours studied to mid term marks. This is known as a binary classification problem.

The process by which a MLP learns is called the Backpropagation (BackProp) algorithm. BackProp is like "learning from mistakes. The supervisor corrects the ANN whenever it makes a mistake.
In supervised learning, the training set is labeled. This means, for some given inputs, we know the desired/expected output (label).

For BackProp, initially all the edge weights are randomly assigned. For every input in the training dataset, the ANN is activated and its output is observed. The output is compared with the desired output that we already know, and the error is "propagated" back to the previous layer. This error is notes and the weights are "adjusted" accordingly. This process is repeated until the output error is below a predetermined threshold.

Once the above algorithm terminates, we have a "learned" ANN.

See:
https://www.quora.com/How-do-you-explain-back-propagation-algorithm-to-a-beginner-in-neural-network/answer/Hemanth-Kumar-Mantri for more information on the back propagation algorithm.

Using CNNs To Speed Up Systems

http://semiengineering.com/using-cnns-to-speed-up-systems

Convolutional neural networks (CNNs) are becoming one of the key differentiators in system performance. The most important power impact from the CNNs is for the ADAS (advanced driver assistance system) application. Power is critical factor for CNNs. The key part of a CNN is matrix multiplication. So far, it is not clear what is the best platform for this (SIMD, FPGA, DSPs, GPU). There are significant power and performance tradeoffs associated with CNNs.

At its most basic, CNNs are a combination of MAC operations and memory handling. "You need to be able to get the data in and out quickly so that you don't stave your MAC".

Is there a difference between NN and CNNs?

https://www.quora.com/Is-there-a-difference-between-neural-networks-and-convolutional-neural-networks

NN is a generic name for a large class of machine learning algorithms. Most of the algorithms are trained with back propagation.

In late 1980s, early 1990s, the dominating algorithm in neural nets (and machine learning in general) was fully connected neural networks. These algorithms have a large number of parameters and do not scale well. Then comes CNNs which is not a fully connected network and where neurons share weights. These types of neural nets have been proven successfully especially in the fields of computer vision and natural language recognition.

Neural Networks and Deep Learning Book

http://neuralnetworksanddeeplearning.com/index.html

Chapter 1

Using neural nets to recognize handwritten digits. Idea is to take large number of handwritten digits, known as training examples and develop a system that can learn from those training examples.

Perceptron takes binary inputs and produces single binary output. The inputs are weighted. Dot product of weights and inputs, then use as input to activation/non-linear function (this is sigmoid?). Example: Go to cheese festival? What factors: x1 is weather good, x2 does girlfriend want to go, etc. Weights applies to each decision factor.

For handwriting recognition problem. 28x28 pixel = 784 input neurons. Output layer is 0-9 indicating the result.

(I am omitting all the math in this chapter)

Here's how the program presented in the chapter works:

Loads MNIST training data set
Set up neural network: 784 input neurons, 30 in second layer, 10 at output
Use stochastic gradient descent to learn from MNIST training data over 30 epocs, mini-batch size of 10, and learning rate of 3.0

stochastic gradient descent is the standard learning algorithm for neural networks
mini-batch random subset of training data?
Once we exhausted all training inputs we've completed an epoch of training
Increasing learning rate to improve results

Deep learning means more layers.

Chapter 2

Backprop was originally introduced in 1970s. Mathematically intensive. Backprop is the workhorse of learning in neural networks. At heart of backprop is an expression for the partial derivate of the cost function with respect to any weight of the network. The expression tells us how quickly the cost changes when we change the weights and biases. It gives insights into how changing the weights and biases changes the overall behavior of the network.

Friday, July 14, 2017

I Jumped!

And, it's official! I jumped.

I left my role as Engineering Manager at Intel to build my own startup company. No details available other than "stealth startup". I am actively recruiting for co-founders, advisors, angel investors, business/strategic partners, and interns (all equity only). My personal runway is sufficient to get started and I am open to short term (preferably part time) consulting gigs to extend.

I am planning between 2 and 3 weeks of vacation before starting full time. First, a week in Arches and Canyonlands National Parks in Utah, then a weekend of hiking in Point Reyes, and a few days here and there by the coast.

Needless to say, this will be a challenge. I have never started a company before. Success to me means the ability to engage in deliberate practice and the exponential professional/personal growth that I expect will follow; company growth, market traction, revenue generation, etc., while important, are all secondary for now.

My belief is that the limiting factor in this endeavor is myself and, therefore, I am spending the necessary energy for my own "upkeep". Besides the obvious aspect of time management; nutrition, sleep, exercise, and (yes) meditation are key components to the plan. If the company "fails" because I "need" to cut corners - so be it.

Fear exists in this statement, but is irrelevant. Here, Susan Jeffers offers the five truths about fear:

The fear will never go away as long as you continue to grow!
The only way to get rid of the fear of doing something is to go out and…do it!
The only way to feel better about yourself is to go out and…do it!
Not only are you afraid when facing the unknown, so is everyone else!
Pushing through fear is less frightening than living with the bigger underlying fear that comes from a feeling of helplessness!

Wish me luck!