Margin-Like Quantities and Generalized Approximate Cross Validation for Support Vector Machines
It is now common knowledge that the support vector machine (SVM) paradigm, which has proved highly successful in a number of classification studies, can be cast as a variational/regularization problem in a reproducing kernel Hilbert space (RKHS), see Kimeldorf & Wahba (1971), Wahba (1990), Girosi (1997), Poggio & Girosi (1998), the papers and references in Schoelkopf, Burges & Smola (1999), and elsewhere. In this note, which is a sequel to Wahba (1998), we look at the SVM paradigm from the point of view of a regularization problem, which allows a comparison with penalized likelihood methods, as well as the application of model selection and tuning approaches which have been used with those and other regularization-type algorithms to choose tuning parameters in nonparametric statistical models. We first review the steps connecting the SVM paradigm in RKHS and its connection to the (dual) mathematical programming problem traditional in SVM classification problems. We then review the Generalized Comparative Kullback-Leibler Distance (GCKL) for the usual SVM paradigm, and observe that it is trivially a simple upper bound on the expected misclassification rate. Next we revisit the GACV as a proxy for the GCKL proposed in Wahba (1998) and the argument that it is a reasonable estimate of the GCKL. We found that it is not necessary to do the randomization of the GACV in Wahba (1998), because it can be replaced by an equally justifiable approximation which is readily computed exactly, along with the SVM solution to the dual mathematical programming problem. This estimate turns out interestingly, but not surprisingly to be simply related to what several authors have identified as the (observed) VC dimension of the estimated SVM. Some preliminary simulations are suggestive of the fact that the minimizer of the GACV is in fact a reasonable estimate of the minimizer of the GCKL, although further simulation and theoretical studies are warranted. It is hoped that this preliminary work will lead to better understanding of ``tuning'' issues in the optimization of SVM's and related classifiers.