Table of Contents for Jan Larsen's Ph.D. Thesis

Table of Contents for Jan Larsen's Ph.D. Thesis:
Design of Neural Network Filters

SYNOPSIS i

ABSTRACT iii

PREFACE ix

SYMBOLS, REFERENCE INDICATIONS AND ABBREVIATIONS xiii

1 INTRODUCTION 1

   1.1 Historical Outline 2

   1.2 Nonlinear Models 3

   1.3 Fields of Applications 7

   1.4 Designing Neural Network Filters 11

   1.5 Summary 13

2 NONLINEAR FILTER ANALYSIS 15

   2.1 Basic Properties of Nonlinear Filters 15

     2.1.1 Superposition 16

     2.1.2 Time-Invariance 16

     2.1.3 Stability 18

     2.1.4 Causality 18

   2.2 Analysis of Nonlinear Filters 18

     2.2.1 Nonlinear Filters based on Zero-Memory Nonlinear Filters 20

     2.2.2 Volterra Filters and Frequency-Domain Interpretations 23

     2.2.3 Concluding Remarks on Nonlinear Filter Analysis 26

   2.3 Summary 27

3 NONLINEAR FILTER ARCHITECTURES 28

   3.1 Taxonomy of Nonlinear Filter Architectures 28

     3.1.1 Parametric and Nonparametric Architectures 29

     3.1.2 Local and Global Approximation 30

     3.1.3 Orthogonal Architectures 35

     3.1.4 Classification of Filter Architectures 39

   3.2 Global Approximation 39

     3.2.1 Polynomial Filters 39

     3.2.2 Multi-Layer Neural Networks 49

     3.2.3 Gram-Schmidt Neural Networks 65

     3.2.4 Canonical Piecewise-Linear Filters 68

     3.2.5 Semilocal Units 71

     3.2.6 Projection Pursuit 73

     3.2.7 Neural Network with FIR/IIR Synapses 74

   3.3 Local Approximation 75

     3.3.1 Localized Receptive Field Networks 75

     3.3.2 Tree-Structured Piecewise-Linear Filter 78

     3.3.3 Gate Function Filters 80

   3.4 Nonparametric Approximation 81

     3.4.1 Local Filters 81

     3.4.2 Parzen Window Regression 82

   3.5 Discussion 83

   3.6 Summary 84

4 A GENERIC NONLINEAR FILTER ARCHITECTURE BASED ON NEURAL NETWORKS 85

   4.1 The Generic Neural Network Architecture 85

   4.2 Preprocessing Methods 87

     4.2.1 Preprocessing Algorithm 88

   4.3 Principal Component Analysis 92

   4.4 Derivative Preprocessor 95

   4.5 Laguerre Function Preprocessor 98

   4.6 Summary 99

5 ALGORITHMS FOR FILTER PARAMETER ESTIMATION 100

   5.1 Parameter Estimation 100

     5.1.1 Consistency 103

     5.1.2 Least Squares Estimation 104

     5.1.3 Maximum Likelihood Estimation 106

   5.2 Performing Least Squares Estimation 107

   5.3 Gradient Descent Algorithm 111

     5.3.1 Convergence 113

     5.3.2 Weight Initialization 114

     5.3.3 The Back-Propagation Algorithm 120

     5.3.4 Back-Propagation in the MFPNN 122

   5.4 Stochastic Gradient Algorithm 128

     5.4.1 Convergence and Step-Size Selection 129

     5.4.2 The SG-algorithm and Computational Complexity 132

   5.5 The Modified Gauss-Newton Algorithm 134

   5.6 Recursive Gauss-Newton Algorithm 139

   5.7 Recursive Gauss-Newton Algorithm with\newline Bierman Factorization 146

   5.8 Summary 150

6 FILTER ARCHITECTURE SYNTHESIS 151

   6.1 System and Model 151

   6.2 A Priori Knowledge 152

   6.3 Generalization Ability 152

     6.3.1 Alternative Definitions of Generalization Ability 153

     6.3.2 Average Generalization Error 154

     6.3.3 Decomposition of the Generalization Error 155

     6.3.4 Model Error Decomposition 156

     6.3.5 Bias/Variance Decomposition 160

     6.3.6 Simple Invariance Property of the Generalization Error within MFPNN 161

   6.4 Fundamental Limitations 163

     6.4.1 Complete Models 164

     6.4.2 Incomplete Models 164

   6.5 Generalization Error Estimates 166

     6.5.1 Basic Architecture Synthesis Algorithm 166

     6.5.2 The Mean Square Training Error 168

     6.5.3 Cross-Validation 169

     6.5.4 Leave-One-Out Cross-Validation 172

     6.5.5 An Information Theoretic Criterion 174

     6.5.6 The Final Prediction Error Estimator 175

     6.5.7 The Generalized Prediction Error Estimator 177

     6.5.8 The Generalization Error Estimator for Incomplete, Nonlinear Models 179

   6.6 Statistical Methods for Improving Generalization 195

     6.6.1 Linear Hypotheses 195

     6.6.2 Hypothesis Testing 196

     6.6.3 Asymptotic Distribution of the Weight Estimate 197

     6.6.4 Irregular Hypotheses 202

     6.6.5 Retraining 203

     6.6.6 Similarities Between the Statistical Framework and Generalization Error Estimates 205
   6.7 Procedures for Pruning the Architecture 206

     6.7.1 Simple Pruning in Neural Networks 206

     6.7.2 Statistically Based Pruning Procedures 207

     6.7.3 Statistical Pruning Algorithms 214

     6.7.4 Pruning Algorithms based on Generalization Error Estimates 215

     6.7.5 Optimal Brain Damage 217

     6.7.6 Optimal Brain Surgeon 218

6.8 Procedures for Expanding the Architecture 220

     6.8.1 Stepwise Forward Inclusion 221

     6.8.2 Cascade-Correlation 222

     6.8.3 Statistical Expansion Test 223

     6.8.4 Partition Function Filters 227

   6.9 Reducing Generalization Error by Regularization 231

   6.10 Summary 233

7 VALIDATION OF ARCHITECTURE SYNTHESIS METHODS 235

   7.1 Validation of the GEN-estimate 235

     7.1.1 Simulation Setup 236

     7.1.2 Simulated Systems 242

     7.1.3 Simulation Results 253

   7.2 Validation of Statistically based Pruning Algorithms 276

     7.2.1 Simulation Setup 277

     7.2.2 Results 280

   7.3 Summary 280

8 SIMULATION OF ARTIFICIAL SYSTEMS 282

   8.1 Validating the Weight Initialization Algorithm 282

   8.2 Comparison of Parameter Estimation Algorithms 283

   8.3 Testing Neural Networks for Signal Processing Tasks 288

     8.3.1 System Identification 288

     8.3.2 Inverse Modeling 289

     8.3.3 Time-Series Prediction 296

     8.3.4 Modeling the Simulated Systems 299

   8.4 Summary 310

9 CONCLUSION 315

A GENERALIZATION ERROR ESTIMATES FOR XN-MODELS 321

   A.1 The Basis of Estimating the Generalization Error 321

     A.1.1 Systems and Models 321

     A.1.2 Estimation of Model Parameters 323

     A.1.3 Generalization Ability 326

   A.2 Derivation of Generalization Error Estimates 327

     A.2.1 LS Cost Function 328

     A.2.2 LS Cost Function with Regularization Term 346

   A.3 Summary 354

B APPROXIMATION OF INVERSE STOCHASTIC MATRICES 355

   B.1 Approximation of H_N^-1 355

   B.2 Approximation of E{H_N^-1} 358

     B.2.1 On the Large N Assumption 360

     B.2.2 LX-models 365

   B.3 Summary 367

C EXPECTATION OF PRODUCT-SUMS OF STOCHASTIC MATRICES 368

D EVALUATION OF GAUSSIAN INTEGRALS 370

   D.1 One and Two-dimensional Gaussian Integrals 370

   D.2 Generalization Error in a Simple Neural Model 371

     D.2.1 The term G_1 373

     D.2.2 The term G_2 374

     D.2.3 The term G_3 374

   D.3 Summary 375

E MOMENTS OF GAUSSIAN STOCHASTIC VECTORS 376

   E.1 The Hessian of a Polynomial Filter 376

   E.2 Moment Calculations 377

F STUDIES OF THE WEIGHT FLUCTUATION PENALTY 380

   F.1 On the Changes in WFP Due to Model Complexity 380

   F.2 The WFP when Dealing with Insignificant Weights 385

     F.2.1 WFP of the Unrestricted Model 386

     F.2.2 WFP of the Restricted Model 387

G REDUCING GENERALIZATION ERROR BY REGULARIZATION 389

   G.1 System and Model 389

   G.2 Mean Square Model Error 390

   G.3 Weight Fluctuation Penalty 391

   G.4 Optimizing the Regularization Parameter 395

H PAPER 1: A NEURAL ARCHITECTURE FOR ADAPTIVE FILTERING 398

   H.1 Introduction 399

   H.2 Nonlinear Filter Architecture 400

   H.3 Filter Design 400

     H.3.1 Signal Dependence 400

     H.3.2 Preprocessing Methods 401

     H.3.3 Memoryless Multidimensional Nonlinearities 402

     H.3.4 Weight Estimation Algorithms 403

   H.4 Simulations 404

     H.4.1 Simulated Systems 404

     H.4.2 Numerical Results 405

   H.5 Conclusion 406

H.6 Acknowledgments 406

I PAPER 2: A GENERALIZATION ERROR ESTIMATE 408

   I.1 Introduction 409

   I.2 Estimate for Incomplete, Nonlinear Models 410

   I.3 Numerical Experiments 412

     I.3.1 Linear System 413

     I.3.2 Simple Neural Network 415

   I.4 Conclusion 416

   I.5 Acknowledgments 417

BIBLIOGRAPHY 418

SYNOPSIS	i
ABSTRACT	iii
PREFACE	ix
SYMBOLS, REFERENCE INDICATIONS AND ABBREVIATIONS	xiii
1 INTRODUCTION	1
1.1 Historical Outline	2
1.2 Nonlinear Models	3
1.3 Fields of Applications	7
1.4 Designing Neural Network Filters	11
1.5 Summary	13
2 NONLINEAR FILTER ANALYSIS	15
2.1 Basic Properties of Nonlinear Filters	15
2.1.1 Superposition	16
2.1.2 Time-Invariance	16
2.1.3 Stability	18
2.1.4 Causality	18
2.2 Analysis of Nonlinear Filters	18
2.2.1 Nonlinear Filters based on Zero-Memory Nonlinear Filters	20
2.2.2 Volterra Filters and Frequency-Domain Interpretations	23
2.2.3 Concluding Remarks on Nonlinear Filter Analysis	26
2.3 Summary	27
3 NONLINEAR FILTER ARCHITECTURES	28
3.1 Taxonomy of Nonlinear Filter Architectures	28
3.1.1 Parametric and Nonparametric Architectures	29
3.1.2 Local and Global Approximation	30
3.1.3 Orthogonal Architectures	35
3.1.4 Classification of Filter Architectures	39
3.2 Global Approximation	39
3.2.1 Polynomial Filters	39
3.2.2 Multi-Layer Neural Networks	49
3.2.3 Gram-Schmidt Neural Networks	65
3.2.4 Canonical Piecewise-Linear Filters	68
3.2.5 Semilocal Units	71
3.2.6 Projection Pursuit	73
3.2.7 Neural Network with FIR/IIR Synapses	74
3.3 Local Approximation	75
3.3.1 Localized Receptive Field Networks	75
3.3.2 Tree-Structured Piecewise-Linear Filter	78
3.3.3 Gate Function Filters	80
3.4 Nonparametric Approximation	81
3.4.1 Local Filters	81
3.4.2 Parzen Window Regression	82
3.5 Discussion	83
3.6 Summary	84
4 A GENERIC NONLINEAR FILTER ARCHITECTURE BASED ON NEURAL NETWORKS	85
4.1 The Generic Neural Network Architecture	85
4.2 Preprocessing Methods	87
4.2.1 Preprocessing Algorithm	88
4.3 Principal Component Analysis	92
4.4 Derivative Preprocessor	95
4.5 Laguerre Function Preprocessor	98
4.6 Summary	99
5 ALGORITHMS FOR FILTER PARAMETER ESTIMATION	100
5.1 Parameter Estimation	100
5.1.1 Consistency	103
5.1.2 Least Squares Estimation	104
5.1.3 Maximum Likelihood Estimation	106
5.2 Performing Least Squares Estimation	107
5.3 Gradient Descent Algorithm	111
5.3.1 Convergence	113
5.3.2 Weight Initialization	114
5.3.3 The Back-Propagation Algorithm	120
5.3.4 Back-Propagation in the MFPNN	122
5.4 Stochastic Gradient Algorithm	128
5.4.1 Convergence and Step-Size Selection	129
5.4.2 The SG-algorithm and Computational Complexity	132
5.5 The Modified Gauss-Newton Algorithm	134
5.6 Recursive Gauss-Newton Algorithm	139
5.7 Recursive Gauss-Newton Algorithm with\newline Bierman Factorization	146
5.8 Summary	150
6 FILTER ARCHITECTURE SYNTHESIS	151
6.1 System and Model	151
6.2 A Priori Knowledge	152
6.3 Generalization Ability	152
6.3.1 Alternative Definitions of Generalization Ability	153
6.3.2 Average Generalization Error	154
6.3.3 Decomposition of the Generalization Error	155
6.3.4 Model Error Decomposition	156
6.3.5 Bias/Variance Decomposition	160
6.3.6 Simple Invariance Property of the Generalization Error within MFPNN	161
6.4 Fundamental Limitations	163
6.4.1 Complete Models	164
6.4.2 Incomplete Models	164
6.5 Generalization Error Estimates	166
6.5.1 Basic Architecture Synthesis Algorithm	166
6.5.2 The Mean Square Training Error	168
6.5.3 Cross-Validation	169
6.5.4 Leave-One-Out Cross-Validation	172
6.5.5 An Information Theoretic Criterion	174
6.5.6 The Final Prediction Error Estimator	175
6.5.7 The Generalized Prediction Error Estimator	177
6.5.8 The Generalization Error Estimator for Incomplete, Nonlinear Models	179
6.6 Statistical Methods for Improving Generalization	195
6.6.1 Linear Hypotheses	195
6.6.2 Hypothesis Testing	196
6.6.3 Asymptotic Distribution of the Weight Estimate	197
6.6.4 Irregular Hypotheses	202
6.6.5 Retraining	203
6.6.6 Similarities Between the Statistical Framework and Generalization Error Estimates	205
6.7 Procedures for Pruning the Architecture	206
6.7.1 Simple Pruning in Neural Networks	206
6.7.2 Statistically Based Pruning Procedures	207
6.7.3 Statistical Pruning Algorithms	214
6.7.4 Pruning Algorithms based on Generalization Error Estimates	215
6.7.5 Optimal Brain Damage	217
6.7.6 Optimal Brain Surgeon	218
6.8 Procedures for Expanding the Architecture	220
6.8.1 Stepwise Forward Inclusion	221
6.8.2 Cascade-Correlation	222
6.8.3 Statistical Expansion Test	223
6.8.4 Partition Function Filters	227
6.9 Reducing Generalization Error by Regularization	231
6.10 Summary	233
7 VALIDATION OF ARCHITECTURE SYNTHESIS METHODS	235
7.1 Validation of the GEN-estimate	235
7.1.1 Simulation Setup	236
7.1.2 Simulated Systems	242
7.1.3 Simulation Results	253
7.2 Validation of Statistically based Pruning Algorithms	276
7.2.1 Simulation Setup	277
7.2.2 Results	280
7.3 Summary	280
8 SIMULATION OF ARTIFICIAL SYSTEMS	282
8.1 Validating the Weight Initialization Algorithm	282
8.2 Comparison of Parameter Estimation Algorithms	283
8.3 Testing Neural Networks for Signal Processing Tasks	288
8.3.1 System Identification	288
8.3.2 Inverse Modeling	289
8.3.3 Time-Series Prediction	296
8.3.4 Modeling the Simulated Systems	299
8.4 Summary	310
9 CONCLUSION	315
A GENERALIZATION ERROR ESTIMATES FOR XN-MODELS	321
A.1 The Basis of Estimating the Generalization Error	321
A.1.1 Systems and Models	321
A.1.2 Estimation of Model Parameters	323
A.1.3 Generalization Ability	326
A.2 Derivation of Generalization Error Estimates	327
A.2.1 LS Cost Function	328
A.2.2 LS Cost Function with Regularization Term	346
A.3 Summary	354
B APPROXIMATION OF INVERSE STOCHASTIC MATRICES	355
B.1 Approximation of H_N^-1	355
B.2 Approximation of E{H_N^-1}	358
B.2.1 On the Large N Assumption	360
B.2.2 LX-models	365
B.3 Summary	367
C EXPECTATION OF PRODUCT-SUMS OF STOCHASTIC MATRICES	368
D EVALUATION OF GAUSSIAN INTEGRALS	370
D.1 One and Two-dimensional Gaussian Integrals	370
D.2 Generalization Error in a Simple Neural Model	371
D.2.1 The term G_1	373
D.2.2 The term G_2	374
D.2.3 The term G_3	374
D.3 Summary	375
E MOMENTS OF GAUSSIAN STOCHASTIC VECTORS	376
E.1 The Hessian of a Polynomial Filter	376
E.2 Moment Calculations	377
F STUDIES OF THE WEIGHT FLUCTUATION PENALTY	380
F.1 On the Changes in WFP Due to Model Complexity	380
F.2 The WFP when Dealing with Insignificant Weights	385
F.2.1 WFP of the Unrestricted Model	386
F.2.2 WFP of the Restricted Model	387
G REDUCING GENERALIZATION ERROR BY REGULARIZATION	389
G.1 System and Model	389
G.2 Mean Square Model Error	390
G.3 Weight Fluctuation Penalty	391
G.4 Optimizing the Regularization Parameter	395
H PAPER 1: A NEURAL ARCHITECTURE FOR ADAPTIVE FILTERING	398
H.1 Introduction	399
H.2 Nonlinear Filter Architecture	400
H.3 Filter Design	400
H.3.1 Signal Dependence	400
H.3.2 Preprocessing Methods	401
H.3.3 Memoryless Multidimensional Nonlinearities	402
H.3.4 Weight Estimation Algorithms	403
H.4 Simulations	404
H.4.1 Simulated Systems	404
H.4.2 Numerical Results	405
H.5 Conclusion	406
H.6 Acknowledgments	406
I PAPER 2: A GENERALIZATION ERROR ESTIMATE	408
I.1 Introduction	409
I.2 Estimate for Incomplete, Nonlinear Models	410
I.3 Numerical Experiments	412
I.3.1 Linear System	413
I.3.2 Simple Neural Network	415
I.4 Conclusion	416
I.5 Acknowledgments	417
BIBLIOGRAPHY	418

Table of Contents for Jan Larsen's Ph.D. Thesis: Design of Neural Network Filters

Table of Contents for Jan Larsen's Ph.D. Thesis:
Design of Neural Network Filters