Advanced methods for non-parametric modelling
Carl Edward Rasmussen
for digital signal processing
In this course we will discuss various topics for non-linear, non-parametric
modelling. We will discuss the fundamental issues involved in modelling
and describe some prominent methods including both non-Bayesian and Bayesian
approaches to neural networks. The choice of topics is not exhaustive,
but rather governed by our experience and personal inclinations.
Place: The lectures will take place in building 305, room 205 (2nd floor).
The assignments will be performed in the terminal-room in the basement of
building 305 using the sections' linux machnine.
Time: The course will take place in week 3 (january 18-22). It will
contain 11 lectures given in English, 1,5 hours each, and computer exercises
for 3 to 4 hours in the afternoons. The course is based on test-book material
as well as discussion of recent research papers.
Check also the administration
page for the course.
Basic readings: These references are given as indication of the material
covered last year. They will change in the coming weeks. This is meant
as a list of work you can refer to after the course if you want to get
more information about a given topic.
On numerical optimisation:
Chapters 2 to 4 (pp. 12-94) of: Fletcher (1987) Practical methods of
optimization, 2nd ed., John Wiley: new York., .
Chapter 10 (pp. 394-455) of: Press & al. (1992) Numerical Recipes
in C, 2nd ed., Cambridge. on-line
Appendix B (pp. 121-125) of: Rasmussen, C.E. (1996) Evaluation of
Gaussian processes and other methods for non-linear regression, PhD
thesis, U. of Toronto. postscript
On generalisation estimation:
Chapter 17 of: Efron, B. and Tibshirani, R. (1993) An Introduction
to the Bootstrap, Monograph on Statistics and Probability 57, Chapman&Hall.
Chapter 3 "
Hyper-parameters" (pp. 41-56) of Cyril Goutte (1997) Statstical
learning and regularisation, PhD thesis, U. Paris 6.
Chapter 3 and 5 (pp. 58-89 and 114-145) of: Wand, M.P. and Jones, M.C.
(1995) Kernel Smoothing, Monograph on Statistics and Probability
On multivariate kernel regression:
Lowe, D. (1995) Similarity metric learning for a variable-kernel
classifier, Neural Computation 7:1, pp. 72-85. [HTML],
Goutte, C. and Larsen, J. Adaptive Metric Kernel Regression,
Neural Networks for Signal Processing VIII--Proceedings of the 1998 IEEE
workshop (Cambridge UK, Sept. 1998), pp. 184-193.
Chapter 2 "
Regularisation" (pp. 19-40) of Cyril Goutte (1997) Statstical learning
and regularisation, PhD thesis, U. Paris 6.
Chapter 4 of C. Bishop (1995) Neural Networks for Pattern Recognition,
Oxford U. Press.
On MCMC for neural networks:
Carl Edward Rasmussen (1996) A Practical Monte Carlo Implementation
of Bayesian Learning, Advances in Neural Information Processing Systems
8, eds. D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, MIT Press: postscript.
Chapter 3, 4 and 5 (pp. 30-86) of: Radford M. Neal (1993)
inference using Markov chain Monte Carlo methods, Technical Report
CRG-TR-93-1, Dept. of Computer Science, University of Toronto, 144 pages:
of contents, postscript.
David J. C. MacKay (1997) Introduction to Monte Carlo Methods,
review paper to appear in the proceedings of an Erice summer school, ed.
M. Jordan: abstract,
Radford M. Neal (1996) Bayesian Learning for Neural Networks,
Lecture Notes in Statistics 118, Springer-Verlag New York: info.
On RBF nets & Mixture models, Chapters 2, 5 & 9 of Bishop
(1995) Neural Networks for Pattern Recognition. While Titterington
et. al. give a more in depth statistical treatment: D.M. Titterington,
A.F.M. Smith and U.E. Makov, Statistical Analysis of Finite Mixture
Distributions, John Wiley \& Sons, 1985. Hierarchical mixtures
of experts are described in M. I. Jordan and R. A. Jacobs, Hierarchical
mixtures of experts and the EM algorithm.
The multiple model approach is reviewed in:
T. A. Johansen and R. Murray-Smith, The Operating Regime Approach
to Nonlinear Modelling and Control, p 3-72, in R. Murray-Smith and
T. A. Johansen, eds., Multiple
Model Approaches to Modelling and Control, Taylor and Francis, 1997.
Online version at ftp://eivind.imm.dtu.dk/pub/rod-bookch1.ps.gz
On Gaussian Processes:
Christopher K. I. Williams & Carl Edward Rasmussen (1996)
Processes for Regression, Advances in Neural Information Processing
Systems 8, eds. D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, MIT Press:
Chapter 4 (pp. 49-67) of: Carl Edward Rasmussen (1996) Evaluation
of Gaussian processes and other methods for non-linear regression,
PhD thesis, U. of Toronto: postscript.
Radford M. Neal (1997) Monte Carlo implementation of Gaussian process
models for Bayesian regression and classification Technical Report
No. 9702, Dept. of Statistics (January 1997), 24 pages: abstract,
datasets, Matlab functions and assignments are available on a separate
The topics covered will be:
We will focus our attention on the following non-parametric models:
probabilistic model fitting,
Markov Chain methods.
Radial Basis Function networks,
Gaussian mixture models.
Monday 18, 9.00- 9.30: Course introduction
Monday 18, 9.40-12.00: Numerical methods for model
One-dimensional optimisation and line search, Multi-dimensional optimisation,
steepest descent, Newton and quasi-Newton, conjugate gradient.
Monday 18, 14.00-15.20: Introduction to Neural Networks
The multi-layer perceptron (MLP), training of MLP, regularisation techniques,
parameter selection (pruning).
Monday 18, 15.30-17.00: Generalisation and Regularisation
Well-posed and ill-posed problems, regularisation, example on linear models
Risk minimisation, generalisation, generalisation bounds, generalisation
Tuesday 19, 9.00-13.00: Assignment 1: Optimisation
and NN (in Matlab)
Comparison of optimisation methods on a simple problem.
Training a neural network.
Experiments on over-fitting.
Tuesday 19, 14.00-15.20: Kernel methods
Kernel density estimation, kernel regression, local smoothing, bandwidth
estimation, multivariate regression, variable metric.
Tuesday 19, 15.30-17.00: Introduction to Radial Basis Function networks & Mixture models
Multiple component approaches, Basis function models, Mixture models, Expectation Maximisation (EM)
Wednesday 20, 9.00-13.00: Assignment 2: Kernel
and RBF (in Matlab)
Multivariate kernel regression.
Mixture of Gaussians for probability density function estimation.
Mixture of linear models for regression
RBF net for regression.
Wednesday 20, 14.00-15.20: Bayesian inference
Wednesday 20, 15.30-17.00: Bayesian training of
Thursday 21, 9.00-13.00: Assignment 3: MCMC and
Bayesian training of neural networks using Radford Neal's flexible Bayesian
Thursday 21, 14.00-15.20: Gaussian Processes
Thursday 21, 15.30-17.00: Bayesian training of
Friday 22, 9.00-13.00: Assignment 4: Gaussian Processes
MAP training of Gaussian Processes.
Bayesian trianing of GP using Radford Neal's fbm software.
Friday 22, 14.00-15.20: Local models, effective degrees of freedom & equivalent kernels
Mixture models where each mixture has a local linear function instead of a constant weight.
Effective degrees of freedom.
Equivalent kernel interpretation of linear-in-the-parameters identification of basis function models.
Friday 22, 15.30-17.00: Infinite Gaussian mixture
If you access this page through frames, you can click or copy the following
link in your browser
Last modified October 29, 1998
Write to the DSP, IMM webmaster at firstname.lastname@example.org
Copyright 1998 by Section for DSP, IMM.