Table of Contents for Jan Larsen's Ph.D. Thesis:
Design of Neural Network Filters

SYNOPSISi
ABSTRACTiii
PREFACEix
SYMBOLS, REFERENCE INDICATIONS AND ABBREVIATIONSxiii
1 INTRODUCTION1
   1.1 Historical Outline2
   1.2 Nonlinear Models3
   1.3 Fields of Applications7
   1.4 Designing Neural Network Filters11
   1.5 Summary13
2 NONLINEAR FILTER ANALYSIS15
   2.1 Basic Properties of Nonlinear Filters15
     2.1.1 Superposition16
     2.1.2 Time-Invariance16
     2.1.3 Stability18
     2.1.4 Causality18
   2.2 Analysis of Nonlinear Filters18
     2.2.1 Nonlinear Filters based on Zero-Memory Nonlinear Filters20
     2.2.2 Volterra Filters and Frequency-Domain Interpretations23
     2.2.3 Concluding Remarks on Nonlinear Filter Analysis26
   2.3 Summary27
3 NONLINEAR FILTER ARCHITECTURES28
   3.1 Taxonomy of Nonlinear Filter Architectures28
     3.1.1 Parametric and Nonparametric Architectures29
     3.1.2 Local and Global Approximation30
     3.1.3 Orthogonal Architectures35
     3.1.4 Classification of Filter Architectures39
   3.2 Global Approximation39
     3.2.1 Polynomial Filters39
     3.2.2 Multi-Layer Neural Networks49
     3.2.3 Gram-Schmidt Neural Networks65
     3.2.4 Canonical Piecewise-Linear Filters68
     3.2.5 Semilocal Units71
     3.2.6 Projection Pursuit73
     3.2.7 Neural Network with FIR/IIR Synapses74
   3.3 Local Approximation75
     3.3.1 Localized Receptive Field Networks75
     3.3.2 Tree-Structured Piecewise-Linear Filter78
     3.3.3 Gate Function Filters80
   3.4 Nonparametric Approximation81
     3.4.1 Local Filters81
     3.4.2 Parzen Window Regression82
   3.5 Discussion83
   3.6 Summary84
4 A GENERIC NONLINEAR FILTER ARCHITECTURE BASED ON NEURAL NETWORKS85
   4.1 The Generic Neural Network Architecture85
   4.2 Preprocessing Methods87
     4.2.1 Preprocessing Algorithm88
   4.3 Principal Component Analysis92
   4.4 Derivative Preprocessor95
   4.5 Laguerre Function Preprocessor98
   4.6 Summary99
5 ALGORITHMS FOR FILTER PARAMETER ESTIMATION100
   5.1 Parameter Estimation100
     5.1.1 Consistency103
     5.1.2 Least Squares Estimation104
     5.1.3 Maximum Likelihood Estimation106
   5.2 Performing Least Squares Estimation107
   5.3 Gradient Descent Algorithm111
     5.3.1 Convergence113
     5.3.2 Weight Initialization114
     5.3.3 The Back-Propagation Algorithm120
     5.3.4 Back-Propagation in the MFPNN122
   5.4 Stochastic Gradient Algorithm128
     5.4.1 Convergence and Step-Size Selection129
     5.4.2 The SG-algorithm and Computational Complexity132
   5.5 The Modified Gauss-Newton Algorithm134
   5.6 Recursive Gauss-Newton Algorithm139
   5.7 Recursive Gauss-Newton Algorithm with\newline Bierman Factorization146
   5.8 Summary150
6 FILTER ARCHITECTURE SYNTHESIS151
   6.1 System and Model151
   6.2 A Priori Knowledge152
   6.3 Generalization Ability152
     6.3.1 Alternative Definitions of Generalization Ability153
     6.3.2 Average Generalization Error154
     6.3.3 Decomposition of the Generalization Error155
     6.3.4 Model Error Decomposition156
     6.3.5 Bias/Variance Decomposition160
     6.3.6 Simple Invariance Property of the Generalization Error within MFPNN161
   6.4 Fundamental Limitations163
     6.4.1 Complete Models164
     6.4.2 Incomplete Models164
   6.5 Generalization Error Estimates166
     6.5.1 Basic Architecture Synthesis Algorithm166
     6.5.2 The Mean Square Training Error168
     6.5.3 Cross-Validation169
     6.5.4 Leave-One-Out Cross-Validation172
     6.5.5 An Information Theoretic Criterion174
     6.5.6 The Final Prediction Error Estimator175
     6.5.7 The Generalized Prediction Error Estimator177
     6.5.8 The Generalization Error Estimator for Incomplete, Nonlinear Models179
   6.6 Statistical Methods for Improving Generalization195
     6.6.1 Linear Hypotheses195
     6.6.2 Hypothesis Testing196
     6.6.3 Asymptotic Distribution of the Weight Estimate197
     6.6.4 Irregular Hypotheses202
     6.6.5 Retraining203
     6.6.6 Similarities Between the Statistical Framework and Generalization Error Estimates205
   6.7 Procedures for Pruning the Architecture206
     6.7.1 Simple Pruning in Neural Networks206
     6.7.2 Statistically Based Pruning Procedures207
     6.7.3 Statistical Pruning Algorithms214
     6.7.4 Pruning Algorithms based on Generalization Error Estimates215
     6.7.5 Optimal Brain Damage217
     6.7.6 Optimal Brain Surgeon218
6.8 Procedures for Expanding the Architecture220
     6.8.1 Stepwise Forward Inclusion221
     6.8.2 Cascade-Correlation222
     6.8.3 Statistical Expansion Test223
     6.8.4 Partition Function Filters227
   6.9 Reducing Generalization Error by Regularization231
   6.10 Summary233
7 VALIDATION OF ARCHITECTURE SYNTHESIS METHODS235
   7.1 Validation of the GEN-estimate235
     7.1.1 Simulation Setup236
     7.1.2 Simulated Systems242
     7.1.3 Simulation Results253
   7.2 Validation of Statistically based Pruning Algorithms276
     7.2.1 Simulation Setup277
     7.2.2 Results280
   7.3 Summary280
8 SIMULATION OF ARTIFICIAL SYSTEMS282
   8.1 Validating the Weight Initialization Algorithm282
   8.2 Comparison of Parameter Estimation Algorithms283
   8.3 Testing Neural Networks for Signal Processing Tasks288
     8.3.1 System Identification288
     8.3.2 Inverse Modeling289
     8.3.3 Time-Series Prediction296
     8.3.4 Modeling the Simulated Systems299
   8.4 Summary310
9 CONCLUSION315
A GENERALIZATION ERROR ESTIMATES FOR XN-MODELS321
   A.1 The Basis of Estimating the Generalization Error321
     A.1.1 Systems and Models321
     A.1.2 Estimation of Model Parameters323
     A.1.3 Generalization Ability326
   A.2 Derivation of Generalization Error Estimates327
     A.2.1 LS Cost Function328
     A.2.2 LS Cost Function with Regularization Term346
   A.3 Summary354
B APPROXIMATION OF INVERSE STOCHASTIC MATRICES355
   B.1 Approximation of H_N^-1355
   B.2 Approximation of E{H_N^-1}358
     B.2.1 On the Large N Assumption360
     B.2.2 LX-models365
   B.3 Summary367
C EXPECTATION OF PRODUCT-SUMS OF STOCHASTIC MATRICES368
D EVALUATION OF GAUSSIAN INTEGRALS370
   D.1 One and Two-dimensional Gaussian Integrals370
   D.2 Generalization Error in a Simple Neural Model371
     D.2.1 The term G_1373
     D.2.2 The term G_2374
     D.2.3 The term G_3374
   D.3 Summary375
E MOMENTS OF GAUSSIAN STOCHASTIC VECTORS376
   E.1 The Hessian of a Polynomial Filter376
   E.2 Moment Calculations377
F STUDIES OF THE WEIGHT FLUCTUATION PENALTY380
   F.1 On the Changes in WFP Due to Model Complexity380
   F.2 The WFP when Dealing with Insignificant Weights385
     F.2.1 WFP of the Unrestricted Model386
     F.2.2 WFP of the Restricted Model387
G REDUCING GENERALIZATION ERROR BY REGULARIZATION389
   G.1 System and Model389
   G.2 Mean Square Model Error390
   G.3 Weight Fluctuation Penalty391
   G.4 Optimizing the Regularization Parameter395
H PAPER 1: A NEURAL ARCHITECTURE FOR ADAPTIVE FILTERING398
   H.1 Introduction399
   H.2 Nonlinear Filter Architecture400
   H.3 Filter Design400
     H.3.1 Signal Dependence400
     H.3.2 Preprocessing Methods401
     H.3.3 Memoryless Multidimensional Nonlinearities402
     H.3.4 Weight Estimation Algorithms403
   H.4 Simulations404
     H.4.1 Simulated Systems404
     H.4.2 Numerical Results405
   H.5 Conclusion406
H.6 Acknowledgments406
I PAPER 2: A GENERALIZATION ERROR ESTIMATE408
   I.1 Introduction409
   I.2 Estimate for Incomplete, Nonlinear Models410
   I.3 Numerical Experiments412
     I.3.1 Linear System413
     I.3.2 Simple Neural Network415
   I.4 Conclusion416
   I.5 Acknowledgments417
BIBLIOGRAPHY418