SVM調(diào)參MNIST手寫字符數(shù)據(jù)的推測(cè)模型

同樣來自哥大的工程課程machine learning,這是一堂由IBM的首席研究員講授的機(jī)器學(xué)習(xí)課程。
Background:
Handwriting recognition is a well-studied subject in computer vision and has found wide applications in our daily life (such as USPS mail sorting). In this project, we will explore various machine learning techniques for recognizing handwriting digits. The dataset you will be using is the well-known MINST dataset.
(1) The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. (http://yann.lecun.com/exdb/mnist/)
Below is an example of some digits from the MNIST dataset:

The goal of this project is to build a 10-class classifier to recognize those handwriting digits as accurately as you can. Though deep learning has been widely used for this dataset, in this project, you should NOT use any deep neural nets (DNN) to do the recognition. Rather, you need to use the techniques we have learned so far from the class (such as logistic regression, SVM etc.) plus some other reasonable non-DNN related machine learning techniques (such as random forest, decision tree etc. – though we have not covered those subject in the class yet) to do the work.
Build a classifier using all pixels as features for handwriting recognition.
After loading the dataset with R, we have training dataset and test dataset.
Now we are trying to conduce classification and product predictive model based on SVM. This is original code within R with default attributes:

Typical attributes of SVM function within e0171 package of R include: formula,data,x,y,scale,kernel,degree,gamma,cost.
Kernel(http://stats.stackexchange.com/questions/73032/linear-kernel-and-non-linear-kernel-for-support-vector-machine)
Usually, the decision is whether to use linear or an RBF (aka Gaussian) kernel.There are two main factors to consider: Solving the optimisation problem for a linear kernel is much faster, see e.g. LIBLINEAR. Typically, the best possible predictive performance is better for a nonlinear kernel (or at least as good as the linear one).
Gamma parameter needed for all kernels except linear (default: 1/(data dimension))
Cost Intuitively, the C parameter trades off mis_classification of training examples against simplicity of the decision surface. Low value C tends to make decision surface smooth, while a high C tries all training examples correctly by giving the model freedom to select more samples as support vectors.
Tuned code:
