0.Cover

NN-9

Neural Networks

Model after human brain which consists of billions of neurons interconnected with each other.

Neurons

human brain contains 10-1000 billions of neurons, interconnected with almost-impossible complexity.

Each neuron has a large number of inputs, called dendrites
according to the input, a neuron can either fire or not fire.
Human brain is a connection of neurons:

Models for Neurons

each neuron is represented by a computing element, taking a number of inputs, x_i, and generate output y.
The output should contain some kind of non-linearity. Otherwise, concatenation of linear stages will result in a single linear stage, i.e. No need to have multi-layer network.
A sigmoid function is usually employed because it is differentiable. Hence can use gradient-descent optimization.
In actual implementation, the weights are adjusted during training.

Artificial Neural Networks (ANN)

connecting these neurons together.
Requiring one input layer and one output layer.
Input layer is connected to the input of the network, and contains n neurons, when n=dimension of input features, one feature per neuron.
Output layer contain one neuron for each pattern class.
May contain hidden layers in between.

Advantage of using Neural Network

high computation rates by massive parallelism.
Fault tolerance ¥ damage to a few nodes or links (primarily of local connections) need not impair performance significantly.
Can adapt connection weights in time to improve performance based on current results.
Can implement a very complicated decision function.

Single Layer Perceptron

The input layer is usually not counted when we talk about number of layers of the network, because it do not do any computation.
For a single layer perceptron, there is no hidden layer (i.e. Layer between the input and output layer).
It essentially imposes a linear decision boundary between the two classes.
It cannot handle linearly non-separable cases, e.g. The XOR function

Training of a Single Layer Perceptron

Step 1: Initialize Weights and Threshold

Set w_i(0) (0 ú i ú N-1) and q to small random values. Here w_i(t) is the weight from input i at time t and q is the threshold in the output node.

Step 2: Present New Input and Desired Output

Present new continuous valued input x₀, x₁, ... x_N-1 along with the desired output d(t).

Step 3: Calculate Actual Output

Step 4: Adapt Weights

w_i(t+1) = w_i(t) + h[ d(t) - y(t) ] x_i(t), 0 ú i ú N-1
d(t) = +1 if input from class A
-1 if input from class B
In these equations, h is a positive gain fraction less than 1 and d(t) is the desired correct output for the current input. Note that the weighs are unchanged if the correct decision is made by the net.

Step 5: Repeat by going to step 2.

Multi-Layer Perceptron

contain hidden layer ¥ not directly connected to both the input and output nodes.
More powerful than single layer perceptron (of course).

Example: (A network to implement the XOR function)

q is the

threshold

value

Example: (A network to implement the AND function)

where 0 < d < 1

Theoretically, it can be proved that a single hidden layer is enough for any shape of decision boundary.
However, two hidden layer is still widely used because of faster convergence.
Intuitively, we have the following explanation:
- 1^st layer to get a number of half planes (as in single layer perceptron).
- 2^nd layer to AND the planes together to get a convex region.
- 3^rd layer to perform more logical operations on these convex regions, using OR, for example.

Training a Multi-layer Perceptron by Backpropagation

The following is a single neuron and the sigmodal activation function
The sigmoid function has a simple derivative:
During training, each training pair = input vector + the desired output (vector).

Training Algorithm

Select the next training pair from the training set; apply the input vector to the network input.
Calculate the output of the network
Calculate the error between the network output and the desired output (the target vector from the training pair)
Adjust the weights of the network in a way that minimizes the error.
If acceptable recognition result is not obtained, goto 1.

Adjust the weight of the output layer

Achieved by using gradient descent method.
If we define the total error to be:
, p = output unit

then,
and

Also,
hence, ¼output of node in hidden layer.
By gradient descent rule,
Hence we get the following procedure:

Calculate the d value
d = OUT (1-OUT)(Target-OUT)
The weight between node p in hidden layer and q in output layer is updated by Dw_pq = h d_qOUT_p
h is a training rate coefficient (0.01 - 1.0)
OUT_p, output of neuron p.

Adjust the weights of hidden layer

propagatethe value of d back, just like the input is propagate through the network, but in reverse direction.
Note that p is current layer, q is previous layer
The weight adjustment is similarly determined:

Other Popular Networks

Associative Memory
- Hopfield Net which can perform associative recall.
Adaptive Resonance Theory (ART network)

Neural Network Software

The Stuttgart Neural Network Simulator is a free Neural Network Simulator produced by the Institute of Parallel and Distributed High-Performance Systems (IPVR), University of Stuttgart, Germany.
Latest version 4.1
It runs on Sun, PPC(IBM), Linux and recently Win95/NT.
Homepage:
http://www.informatik.uni-tuttgart.de/ipvr/bv/projekte/snns/
Screenshot: