This Ms.C. thesis by Carl Edward Rasmussen is entitled `Generalization in Neural Networks'. The postcript file contains about 80 pages. ABSTRACT This report is concerned with methods for optimizing the generalization ability of neural networks. The framework is developed to deal with regression type problems, where the networks are trained on a limited amount of noisy data. In this context the problem can be formulated as finding the optimal trade off between data fit and model complexity. Two paradigms for reducing model complexity are discussed: pruning and weight decay. It is shown by numerical experiments that application of weight decay is essential for obtaining good generalization performance. This is explained by the way in which weight decay confines the space of possible networks to a space of `reasonable' networks. Two methods for making statistical estimates of the generalization performance {\it without}\/ use of validation sets are presented: the Generalization method and the Bayesian method. The advantage of not needing validation sets is that all available data can be utilized in the training phase. This feature is important since the optimal generalization ability of a model is directly related to the amount of available training data. The Generalization method is an extension of Akaikes FPE estimator, which explicitly takes the application of weight decay into account. In this method the generalization ability is estimated by averaging over the ensemble of possible training sets, which are consistent with the `true' function. The method allows for optimal setting of weight decay and estimates generalization performance. The Bayesian framework uses the {\it evidence}\/ to measure the plausibility of models. This framework embodies {\it priors}\/ on the network weights, which are shown to be identical to the application of weight decay. The two methods are tested numerically on the sunspot problem. On linear models for this (limited) problem both methods work well, and marked similarities are found between the two methods. Both methods also exhibit the ability to prune weights, when individual weight decay parameters are applied to each weight in the network.