ICA - CNL
Neural Networks Tutorial
Hydroinformatics 98
Lars Kai Hansen
Department of Mathematical Modeling
Building 321
Technical University of Denmark
DK-2800 Lyngby, DENMARK
email: lkhansen@imm.dtu.dk
http://eivind.imm.dtu.dk
Neural networks are increasingly
popular tools for modeling of complex dynamics, noisy time series and pattern recognition problems
which arises, e.g., in hydroinformatics.
Neural networks are often considered as so-called
black box models. They are indeed very well suited for
modeling systems in which the underlying rules
are hard to reveal. Neural nets learn
statistical relations from observations rather than relying on algorithmic
"solutions".
Most standard
neural network architectures posses the property of being universal
learners, i.e., by choosing the architecture carefully it is possible to
learn any task (static relation).
The network provides a relation from a set of input to a set of
output variables, hence captures aspects of the conditional distribution
of the output variables - conditioned on the inputs.
The network is trained to perform the desired task by minimizing
a performance measure or cost function with respect to the
network parameters on the set of training data consisting of
input-output examples. Typical cost functions are mean square errors
(for regression or function approximation) and the entropic error
measure (for pattern recognition nets). Costfunctions can be derived
applying maximum likelihood methods or using the so-called Baysian framework.
Application of Neural Networks
Basic considerations
Three basic issues should be addressed
before applying neural networks in the real world:
What are the input variables of interest?
What should be predicted?
How is succes measured?
Checklist: Topics for choice of Neural Network Paradigm
We here list the most important questions
arising when choosing among the different
neural network methodologies (paradigms).
See
Chris Bishops recent textbook
for a general introduction to these topics.
Computational universality.
A wide class of feed-forward
networks have been shown to be universal functional approximators, i.e., they can model any reasonable function.
In plain words: you can be sure that it is possible to solve
the task with a network in the class.
Efficient learning schemes. It is comforting to know
that the task in question can be solved, but equally
important is it to be sure that the learning
algorithm makes efficient use of the invoked floating point
calculations, and identifies a relevant network solution, if not optimal then useful.
Architecture Optimization.
The ``hidden agenda'' in statistical modeling is that
we would like to be able to
perform well on ALL conceivable data,-- not just the data
in the training set --ie., we would like to perform
well on future test-inputs. Merely fitting the training set is of
no interest. Most learning problems and statistical
modeling tasks are concerned with the compromise between misfit and
overfitting,- aka the bias-variance lemma. If the model
complexity is too low, the learning machine will have a large
misfit, e.g., training error. If the model complexity is too high
the model will memorize the training data, hence , have a very low
training error, but the test error will be high. The test error
is defined as the expected error on an hithereto unseen test example.
The two most widely used techniques for implementing this
compromise are regularization and pruning.
Both schemes have been shown to improve test performance
significantly.
Statistical Evaluation.
The test error -- being defined as an
average quantity --cannot be measured. It is however possible to estimate test performance, either using statistical theory or test sets.
Test sets are data from the database that are hold out during training. By repeating the training procedure
with different training/test set splittings of the database, a so-called cross validation scheme may be implemented.
Active learning. For a number of the standard neural net models it has shown possible to use the trained neural net to guide the expert in providing more examples. This can, e.g., be implemented by computing the kinds of input, for which getting the ``teacher'' output
will be most informative and hence, after subsequent training, increase the retrained networks' test performance.
Ensembles
When the specifics of the problem has been resolved, and
a given family of network algorithms has been chosen one faces an optimization problem with many feasible or near-feasible solutions. Since neural networks involve non-linear adaptation, the training process often provides a wide variety of solutions,
e.g., due to random initializations or random sequencing of examples. Rather than dismissing solutions it is recommended to form collective decisions, i.e., by forming consensus, among the ensemble of network solutions.
A related, though different approach, has been advocated in work on "Mixture of experts". These algorithms involve a gating mechanism that makes decision about
which network (or group of networks) to rely on for a specific input. One may think of this tool as a way of using network that are specialised in certain regions of input space.
Of course one may also consider to form consensus among models derived from different network families. There is not reported much experience with such heterogeneous ensembles.
The DTU groups neural network WWW repository
Our on-line postscript papers
Selected references
H. Akaike: "Fitting Autoregressive
Models for Prediction", Annals of the Institute of Statistical Mathematics, vol. 21, 243--247, 1969.
Key paper on generalization error estimation
S. Geman, E. Bienenstock and R. Doursat, "Neural Networks and the Bias/Variance Dilemma". Neural Computation, vol. 4, pp. 1--58, 1992. Best description of the Bias-Variance dilemma I know.
L.K. Hansen and J. Larsen: "Linear Unlearning for Cross-Validation". Advances in Computational Mathematics.
Introduces unlearning for neural networks as a technique
for approximate cross-validation.
L.K. Hansen and P. Salamon: "Neural Network Ensembles".
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993--1001, Oct. 1990. One of the first papers on neural net ensembles.
J. Hertz, A. Krogh and R.G. Palmer:
"Introduction to the Theory of Neural Computation".
Redwood City, California: Addison-Wesley Publishing Company, 1991. Classic neural network textbook, still one of the best texts to learn neural computing theory from.
J. Larsen & L.K. Hansen: "Empirical Generalization Assessment of Neural Network Models". In F. Girosi, J. Makhoul, E. Manolakos & E. Wilson (eds.),
Proceedings of the IEEE Workshop on Neural Networks for
Signal Processing V, Piscataway, New Jersey: IEEE, pp. 30--39, 1995. Proposes many measures of generalization performance.
Y. Le Cun, J.S. Denker and S.A. Solla: "Optimal Brain Damage". In D.S. Touretzky (ed.) Advances in Neural Information Processing Systems 2, Proceedings of the 1989 Conference, San Mateo, California: Morgan Kaufmann Publishers, 1990, pp. 598--605. Seminl pruning paper
J. Moody: "The Effective Number of Parameters: An Analysis of Generalization and Regularization
in Nonlinear Learning Systems". In J.E. Moody, S.J. Hanson, R.P. Lippmann (eds.) Advances in Neural Information
Processing Systems 4, Proceedings of the 1991 Conference,
San Mateo, California: Morgan Kaufmann Publishers, 1992, pp. 847--854. Seminal paper on application of generalization error estimates.
N. Murata, S. Yoshizawaand and S. Amari: "Network
Information Criterion --- Determining the Number of Hidden Units for an Artificial Neural Network Model".
IEEE Transactions on Neural Networks, vol. 5, no. 6, pp. 865--872, Nov. 1994. Rather detailed review of generalization error estimates for general neural models
C. Svarer, L.K. Hansen, and J. Larsen: "On Design and Evaluation of Tapped Delay Line Networks".
In Proceedings of the 1993 IEEE International Conference on Neural Networks, San Francisco, vol. 1, 46--51, 1993a.
First paper to give a detalied recipe for pruning and
evaluation of networks for time series prediction
Get postscript
A.S. Weigend, B.A. Hubermann and D.E. Rumelhart:
"Predicting the Future: A Connectionist Approach". International Journal of Neural Systems, vol. 1, no. 3, pp. 193--209, 1990. Widely recognised contribution on time series prediction by neural networks
A.S. Weigend and N. Gershenfeld: "Time Series Analysis: Predicting the Future and Understanding the Past". Lecture Notes Santa Fe Institute, Addison Wesley (1994).
Great book on modeling and forecasting in time series.
H. White: "Learning in Artificial Neural Networks: A Statistical Perspective". Neural Computation, vol. 1, pp. 425--464, 1989. Great paper with a statistical analysis
of generalization
Return to homepage.