Abstract for With_PhD_thesis_97.ps:

The present thesis is about optimization of recurrent neural networks applied to 
time series modeling. In particular is considered fully recurrent networks working
from only  a single external input, one layer of nonlinear hidden units and a linear 
output unit applied to prediction of discrete time series. The overall objectives 
are to improve training by application of second-order methods and to improve 
generalization ability by architecture optimization accomplished by pruning. The 
major topics covered in the thesis are:

* The problem of training recurrent networks is analyzed from a numerical point of
  view. Especially it is analyzed how numerical ill-conditioning of the Hessian 
  matrix might arise. 

* Training is significantly improved by application of the damped Gauss-Newton
  method, involving the {\em full} Hessian. This method is found to  outperform
  gradient descent in terms of both quality of solution obtained as well as 
  computation time required.


* A theoretical definition of the generalization error for recurrent networks 
  is provided. This definition justifies a commonly adopted approach for estimating 
  generalization ability.

* The viability of pruning recurrent networks by the Optimal Brain Damage (OBD) and 
  Optimal Brain Surgeon (OBS) pruning schemes is investigated. OBD is found to be 
  very effective whereas OBS is severely influenced by numerical problems which 
  leads to pruning of important weights. 

* A novel operational tool for examination of the internal memory of recurrent
  networks is proposed. The tool allows for assessment of the length of the effective
  memory of previous inputs built up in the recurrent network during application. 
  

Time series modeling is also treated from a more general point of view, namely 
modeling of the joint probability distribution function of the observed series. Two
recurrent models rooted in statistical physics are considered in this respect, 
namely the ``Boltzmann chain'' and the ``Boltzmann zipper'' and a comprehensive 
tutorial on these models is provided. Boltzmann chains and zippers are found to 
benefit as well from second-order training and architecture optimization by pruning
which is illustrated on artificial problems and a small speech recognition problem.