NN-
Neural Networks
Model after human brain which consists of billions of neurons interconnected with each other.
Neurons
human brain contains 10-1000 billions of neurons, interconnected with almost-impossible complexity.
Each neuron has a large number of inputs, called dendrites
according to the input, a neuron can either fire or not fire.
Human brain is a connection of neurons:
Models for Neurons
each neuron is represented by a computing element, taking a number of inputs, xi, and generate output y.
The output should contain some kind of non-linearity. Otherwise, concatenation of linear stages will result in a single linear stage, i.e. No need to have multi-layer network.
A sigmoid function is usually employed because it is differentiable. Hence can use gradient-descent optimization.
In actual implementation, the weights are adjusted during training.
Artificial Neural Networks (ANN)
connecting these neurons together.
Requiring one input layer and one output layer.
Input layer is connected to the input of the network, and contains n neurons, when n=dimension of input features, one feature per neuron.
Output layer contain one neuron for each pattern class.
May contain hidden layers in between.
Advantage of using Neural Network
high computation rates by massive parallelism.
Fault tolerance ¥ damage to a few nodes or links (primarily of local connections) need not impair performance significantly.
Can adapt connection weights in time to improve performance based on current results.
Can implement a very complicated decision function.
Single Layer Perceptron
The input layer is usually not counted when we talk about number of layers of the network, because it do not do any computation.
For a single layer perceptron, there is no hidden layer (i.e. Layer between the input and output layer).
It essentially imposes a linear decision boundary between the two classes.
It cannot handle linearly non-separable cases, e.g. The XOR function
Training of a Single Layer Perceptron
Step 1: Initialize Weights and Threshold
Set wi (0) (0 ú i ú N-1) and q to small random values. Here wi (t) is the weight from input i at time t and q is the threshold in the output node.
Step 2: Present New Input and Desired Output
Present new continuous valued input x0, x1, ... xN-1 along with the desired output d(t).
Step 3: Calculate Actual Output
Step
4: Adapt Weights
wi
(t+1) = wi (t) + h[
d(t) - y(t) ] xi
(t), 0 ú i
ú N-1
d(t)
= +1 if input from class A
-1
if input from class B
In these equations, h
is a positive gain fraction less than 1 and d(t) is the desired
correct output for the current input. Note that the weighs are
unchanged if the correct decision is made by the net.
Step 5: Repeat by going to step 2.
Multi-Layer Perceptron
contain hidden layer ¥ not directly connected to both the input and output nodes.
More powerful than single layer perceptron (of course).
Example: (A network to implement the XOR function)
q is the
threshold
value
Example: (A network to implement the AND function)
where 0 < d < 1
Theoretically, it can be proved that a single hidden layer is enough for any shape of decision boundary.
However, two hidden layer is still widely used because of faster convergence.
Intuitively, we have the following explanation:
1st layer to get a number of half planes (as in single layer perceptron).
2nd layer to AND the planes together to get a convex region.
3rd layer to perform more logical operations on these convex regions, using OR, for example.
Training a Multi-layer Perceptron by Backpropagation
The
following is a single neuron and the sigmodal activation function
The sigmoid function has a simple derivative:
During training, each training pair = input vector + the desired output (vector).
Training Algorithm
Select the next training pair from the training set; apply the input vector to the network input.
Calculate the output of the network
Calculate the error between the network output and the desired output (the target vector from the training pair)
Adjust the weights of the network in a way that minimizes the error.
If acceptable recognition result is not obtained, goto 1.
Adjust the weight of the output layer
Achieved by using gradient descent method.
If
we define the total error to be:
, p = output unit
then,
and
Also,
hence,
¼output
of node in hidden layer.
By gradient
descent rule,
Hence we get the following procedure:
Calculate
the d value
d
= OUT (1-OUT)(Target-OUT)
The weight
between node p in hidden layer and q in output layer is updated by
Dwpq = h
dq OUTp
h
is a training rate coefficient (0.01 -
1.0)
OUTp, output of neuron p.
Adjust
the weights of hidden layer
propagatethe value of d back, just like the input is propagate through the network, but in reverse direction.
Note that p is current layer, q is previous layer
The weight
adjustment is similarly determined:
Other Popular Networks
Associative Memory
Hopfield Net which can perform associative recall.
Adaptive Resonance Theory (ART network)
Neural Network Software
The Stuttgart Neural Network Simulator is a free Neural Network Simulator produced by the Institute of Parallel and Distributed High-Performance Systems (IPVR), University of Stuttgart, Germany.
Latest version 4.1
It runs on Sun, PPC(IBM), Linux and recently Win95/NT.
Homepage:
http://www.informatik.uni-tuttgart.de/ipvr/bv/projekte/snns/
Screenshot: