TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
TensorFlow computation is described by directed graph that represents a dataflow computation, with extensions for maintaining/updating persistent state and for branching and looping control. Each node has 0 or mode inputs and 0 or more outputs, represents and instance of an operation. Values that flow along normal edges are called Tensors. Special edges called control dependencies can also exist. No data flows on such edges.
An operation has a name and represents an abstract computation (matrix mult. or add). An operation can have attributes that are provided at graph-construction time. A kernel is a implementation of an operation that can be run on a particular type of device.
The main component of a TensorFlow system are the client, which communicates with the master, and one or more worker processes. Each worker process is responsible to arbitrating access to 1 or more devices (GPU,CPU,etc.). The worker process executes a sub graph. Communication between nodes is achieved using send/receive primitives (RDMA, TCP).
Carefully scheduling of TF operations can result in better performance of the system, specifically with response to data transfers or memory usage (in GPU, memory is scarce).
Pre-existing highly-optimized numerical libraries (BLAS, cuBLAS, etc.) are used to implement kernels for some operations.
Some ML algorithms, including those typically used for training neural networks, are tolerant of noise and reduced precision arithmetic.