Artificial Neural Networks
Li M. Fu
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Introduction
-
It provides a general, practical method for learning
real-valued, discrete-valued, and vector-valued functions from
examples.
-
It can deal with noise, uncertainty, and data inconsistency
better than symbolic learning methods.
-
The backpropagation algorithm is a popular and useful learning
technique for tuning network parameters to fit
a training set of input-output pairs.
Background
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Topics
- Neural network representation
- Perceptrons
- Multilayer neural networks
- The backpropagation learning algorithm
- A case study: face recognition
- Other neural network models
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The Neural Network Inference Algorithm (A general view)
Given a training instance,
- Present the instance to the network on the input layer.
- Calculate the activation levels of nodes across the network.
- For a feedforward network, if the activation levels
of all output units are calculated, then exit.
For a recurrent network, if the activation levels of all output units
become (near) constant, then exit;
else go to step 2.
However, if the network is found unstable, then exit and fail.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The Neural Network Learning Algorithm (A general view)
Given n training instances,
- Initialize the network weights. Set i = 1.
- Present the ith instance to the network on the input layer.
- Obtain the activation levels of the output units
using the inference algorithm.
If the network performance meets the predefined standard
(or the stopping criterion),
then exit.
- Update the weights by the learning rule of the network.
- If i = n, then reset i = 1.
Otherwise, increment i by 1. Go to step 2.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Perceptrons
-
A single layer perceptron is in essence a linear discriminant,
using a hard-limiting thresholding function.
-
A single layer perceptron is quite limited in its representation power.
For instance, it cannot represent the exclusive-or function.
-
A multilayer perceptron is more flexible. As more layers are added,
it can form any arbitrary decision region.
-
However, there is no good training/learning algorithm to train
a multilayer perceptron until the emergence of the backpropagation algorithm.
Perceptron Learning Algorithm
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Perceptrons: An Example
In a single-layer perceptron, unit 1 receives
inputs from units 2 and 3.
Given that
W_{1,2} = -3, W_{1,3} = 2, X_{2} = 1, X_{3} = 1, theta_{1} = 1
how to calculate O_{1}?
O_{1} = F_{h}(-3 x 1 + 2 x 1 - 1) = F_{h}(-2) = 0
Now, if the desired output T_{1} = 1, how do we adjust weights?
Assume that the learning rate eta = 0.3.
delta_{1} = 1 - 0 = 1
Delta_W_{1,2} = eta * delta_{1} * X_{2} = 0.3 x 1 x 1 = 0.3
Delta_W_{1,3} = eta * delta_{1} * X_{3} = 0.3 x 1 x 1 = 0.3
W_{1,2} = -3 + 0.3 = -2.7
W_{1,3} = 2 + 0.3 = 2.3
The threshold is the negative of the weight W_{1,b} from the bias
unit. That is, W_{1,b} = - theta_{1} = -1.
Delta_W_{1,b} = 0.3 x 1 x 1 = 0.3
W_{1,b} = -1 + 0.3 = -0.7
Thus, the threshold is changed to 0.7.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Backpropagation
- Gradient descent in a multi-layer construct
- The learning algorithm
- Convergence and local minima
- Representation power: Boolean, continuous, arbitrary functions
- Hypothesis space and inductive biaas
- Hidden layer representation
- Stopping conditions
- Generalization
- VC-dimensions
- Weight decay
Backpropagation Method
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
A Case Study: Face Recognition
- Task: Classification of camera images into four poses (left,
right, up, and straight).
624 greyscale images, each with 128 x 128 pixels, each described
by a value between 0 (black) and 255 (white).
- Input encoding: a 30 x 32 input array, each input value
normalized in the range of 0-1.
- Output encoding: four output units (l-of-n encoding).
- Network structure: a 960-3-4 configuration.
- Test performance: 90%.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Other Neural Network Models
- Associative memories
- Self-organization maps
- Recurrent neural networks
- Cascade correlation
- Support vector machines
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Associative Memories
Associative Memories and Hopfield Nets
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Summary
- ANNs provide a general and practical method for machine learning.
It has generated numerous applications.
- What is the hypothesis space?
- What is the search strategy?
- How are new features invented?
- How to deal with overfitting the data?
Other Supplementary Material
Perceptron Learning Algorithm
Matlab-Backpropagation