SYNOPSIS | i |
ABSTRACT | iii |
PREFACE | ix |
SYMBOLS, REFERENCE INDICATIONS AND ABBREVIATIONS | xiii |
1 INTRODUCTION | 1 |
1.1 Historical Outline | 2 |
1.2 Nonlinear Models | 3 |
1.3 Fields of Applications | 7 |
1.4 Designing Neural Network Filters | 11 |
1.5 Summary | 13 |
2 NONLINEAR FILTER ANALYSIS | 15 |
2.1 Basic Properties of Nonlinear Filters | 15 |
2.1.1 Superposition | 16 |
2.1.2 Time-Invariance | 16 |
2.1.3 Stability | 18 |
2.1.4 Causality | 18 |
2.2 Analysis of Nonlinear Filters | 18 |
2.2.1 Nonlinear Filters based on Zero-Memory Nonlinear Filters | 20 |
2.2.2 Volterra Filters and Frequency-Domain Interpretations | 23 |
2.2.3 Concluding Remarks on Nonlinear Filter Analysis | 26 |
2.3 Summary | 27 |
3 NONLINEAR FILTER ARCHITECTURES | 28 |
3.1 Taxonomy of Nonlinear Filter Architectures | 28 |
3.1.1 Parametric and Nonparametric Architectures | 29 |
3.1.2 Local and Global Approximation | 30 |
3.1.3 Orthogonal Architectures | 35 |
3.1.4 Classification of Filter Architectures | 39 |
3.2 Global Approximation | 39 |
3.2.1 Polynomial Filters | 39 |
3.2.2 Multi-Layer Neural Networks | 49 |
3.2.3 Gram-Schmidt Neural Networks | 65 |
3.2.4 Canonical Piecewise-Linear Filters | 68 |
3.2.5 Semilocal Units | 71 |
3.2.6 Projection Pursuit | 73 |
3.2.7 Neural Network with FIR/IIR Synapses | 74 |
3.3 Local Approximation | 75 |
3.3.1 Localized Receptive Field Networks | 75 |
3.3.2 Tree-Structured Piecewise-Linear Filter | 78 |
3.3.3 Gate Function Filters | 80 |
3.4 Nonparametric Approximation | 81 |
3.4.1 Local Filters | 81 |
3.4.2 Parzen Window Regression | 82 |
3.5 Discussion | 83 |
3.6 Summary | 84 |
4 A GENERIC NONLINEAR FILTER ARCHITECTURE BASED ON NEURAL NETWORKS | 85 |
4.1 The Generic Neural Network Architecture | 85 |
4.2 Preprocessing Methods | 87 |
4.2.1 Preprocessing Algorithm | 88 |
4.3 Principal Component Analysis | 92 |
4.4 Derivative Preprocessor | 95 |
4.5 Laguerre Function Preprocessor | 98 |
4.6 Summary | 99 |
5 ALGORITHMS FOR FILTER PARAMETER ESTIMATION | 100 |
5.1 Parameter Estimation | 100 |
5.1.1 Consistency | 103 |
5.1.2 Least Squares Estimation | 104 |
5.1.3 Maximum Likelihood Estimation | 106 |
5.2 Performing Least Squares Estimation | 107 |
5.3 Gradient Descent Algorithm | 111 |
5.3.1 Convergence | 113 |
5.3.2 Weight Initialization | 114 |
5.3.3 The Back-Propagation Algorithm | 120 |
5.3.4 Back-Propagation in the MFPNN | 122 |
5.4 Stochastic Gradient Algorithm | 128 |
5.4.1 Convergence and Step-Size Selection | 129 |
5.4.2 The SG-algorithm and Computational Complexity | 132 |
5.5 The Modified Gauss-Newton Algorithm | 134 |
5.6 Recursive Gauss-Newton Algorithm | 139 |
5.7 Recursive Gauss-Newton Algorithm with\newline Bierman Factorization | 146 |
5.8 Summary | 150 |
6 FILTER ARCHITECTURE SYNTHESIS | 151 |
6.1 System and Model | 151 |
6.2 A Priori Knowledge | 152 |
6.3 Generalization Ability | 152 |
6.3.1 Alternative Definitions of Generalization Ability | 153 |
6.3.2 Average Generalization Error | 154 |
6.3.3 Decomposition of the Generalization Error | 155 |
6.3.4 Model Error Decomposition | 156 |
6.3.5 Bias/Variance Decomposition | 160 |
6.3.6 Simple Invariance Property of the Generalization Error within MFPNN | 161 |
6.4 Fundamental Limitations | 163 |
6.4.1 Complete Models | 164 |
6.4.2 Incomplete Models | 164 |
6.5 Generalization Error Estimates | 166 |
6.5.1 Basic Architecture Synthesis Algorithm | 166 |
6.5.2 The Mean Square Training Error | 168 |
6.5.3 Cross-Validation | 169 |
6.5.4 Leave-One-Out Cross-Validation | 172 |
6.5.5 An Information Theoretic Criterion | 174 |
6.5.6 The Final Prediction Error Estimator | 175 |
6.5.7 The Generalized Prediction Error Estimator | 177 |
6.5.8 The Generalization Error Estimator for Incomplete, Nonlinear Models | 179 |
6.6 Statistical Methods for Improving Generalization | 195 |
6.6.1 Linear Hypotheses | 195 |
6.6.2 Hypothesis Testing | 196 |
6.6.3 Asymptotic Distribution of the Weight Estimate | 197 |
6.6.4 Irregular Hypotheses | 202 |
6.6.5 Retraining | 203 |
6.6.6 Similarities Between the Statistical Framework and Generalization Error Estimates | 205 |
6.7 Procedures for Pruning the Architecture | 206 |
6.7.1 Simple Pruning in Neural Networks | 206 |
6.7.2 Statistically Based Pruning Procedures | 207 |
6.7.3 Statistical Pruning Algorithms | 214 |
6.7.4 Pruning Algorithms based on Generalization Error Estimates | 215 |
6.7.5 Optimal Brain Damage | 217 |
6.7.6 Optimal Brain Surgeon | 218 |
6.8 Procedures for Expanding the Architecture | 220 |
6.8.1 Stepwise Forward Inclusion | 221 |
6.8.2 Cascade-Correlation | 222 |
6.8.3 Statistical Expansion Test | 223 |
6.8.4 Partition Function Filters | 227 |
6.9 Reducing Generalization Error by Regularization | 231 |
6.10 Summary | 233 |
7 VALIDATION OF ARCHITECTURE SYNTHESIS METHODS | 235 |
7.1 Validation of the GEN-estimate | 235 |
7.1.1 Simulation Setup | 236 |
7.1.2 Simulated Systems | 242 |
7.1.3 Simulation Results | 253 |
7.2 Validation of Statistically based Pruning Algorithms | 276 |
7.2.1 Simulation Setup | 277 |
7.2.2 Results | 280 |
7.3 Summary | 280 |
8 SIMULATION OF ARTIFICIAL SYSTEMS | 282 |
8.1 Validating the Weight Initialization Algorithm | 282 |
8.2 Comparison of Parameter Estimation Algorithms | 283 |
8.3 Testing Neural Networks for Signal Processing Tasks | 288 |
8.3.1 System Identification | 288 |
8.3.2 Inverse Modeling | 289 |
8.3.3 Time-Series Prediction | 296 |
8.3.4 Modeling the Simulated Systems | 299 |
8.4 Summary | 310 |
9 CONCLUSION | 315 |
A GENERALIZATION ERROR ESTIMATES FOR XN-MODELS | 321 |
A.1 The Basis of Estimating the Generalization Error | 321 |
A.1.1 Systems and Models | 321 |
A.1.2 Estimation of Model Parameters | 323 |
A.1.3 Generalization Ability | 326 |
A.2 Derivation of Generalization Error Estimates | 327 |
A.2.1 LS Cost Function | 328 |
A.2.2 LS Cost Function with Regularization Term | 346 |
A.3 Summary | 354 |
B APPROXIMATION OF INVERSE STOCHASTIC MATRICES | 355 |
B.1 Approximation of H_N^-1 | 355 |
B.2 Approximation of E{H_N^-1} | 358 |
B.2.1 On the Large N Assumption | 360 |
B.2.2 LX-models | 365 |
B.3 Summary | 367 |
C EXPECTATION OF PRODUCT-SUMS OF STOCHASTIC MATRICES | 368 |
D EVALUATION OF GAUSSIAN INTEGRALS | 370 |
D.1 One and Two-dimensional Gaussian Integrals | 370 |
D.2 Generalization Error in a Simple Neural Model | 371 |
D.2.1 The term G_1 | 373 |
D.2.2 The term G_2 | 374 |
D.2.3 The term G_3 | 374 |
D.3 Summary | 375 |
E MOMENTS OF GAUSSIAN STOCHASTIC VECTORS | 376 |
E.1 The Hessian of a Polynomial Filter | 376 |
E.2 Moment Calculations | 377 |
F STUDIES OF THE WEIGHT FLUCTUATION PENALTY | 380 |
F.1 On the Changes in WFP Due to Model Complexity | 380 |
F.2 The WFP when Dealing with Insignificant Weights | 385 |
F.2.1 WFP of the Unrestricted Model | 386 |
F.2.2 WFP of the Restricted Model | 387 |
G REDUCING GENERALIZATION ERROR BY REGULARIZATION | 389 |
G.1 System and Model | 389 |
G.2 Mean Square Model Error | 390 |
G.3 Weight Fluctuation Penalty | 391 |
G.4 Optimizing the Regularization Parameter | 395 |
H PAPER 1: A NEURAL ARCHITECTURE FOR ADAPTIVE FILTERING | 398 |
H.1 Introduction | 399 |
H.2 Nonlinear Filter Architecture | 400 |
H.3 Filter Design | 400 |
H.3.1 Signal Dependence | 400 |
H.3.2 Preprocessing Methods | 401 |
H.3.3 Memoryless Multidimensional Nonlinearities | 402 |
H.3.4 Weight Estimation Algorithms | 403 |
H.4 Simulations | 404 |
H.4.1 Simulated Systems | 404 |
H.4.2 Numerical Results | 405 |
H.5 Conclusion | 406 |
H.6 Acknowledgments | 406 |
I PAPER 2: A GENERALIZATION ERROR ESTIMATE | 408 |
I.1 Introduction | 409 |
I.2 Estimate for Incomplete, Nonlinear Models | 410 |
I.3 Numerical Experiments | 412 |
I.3.1 Linear System | 413 |
I.3.2 Simple Neural Network | 415 |
I.4 Conclusion | 416 |
I.5 Acknowledgments | 417 |
BIBLIOGRAPHY | 418 |