MATLAB連續(xù)投影算法SPA篩選波段示例
高光譜篩選敏感波段,自己手頭的歷史代碼總感覺有差錯(cuò),在網(wǎng)上搜到的代碼略微難頂,要不是刪減過(guò)的要不是調(diào)用復(fù)雜,直到在CSDN發(fā)現(xiàn)一篇,參考著進(jìn)行
版權(quán)聲明:本文為CSDN博主「PeanutbutterBoh」的原創(chuàng)文章,遵循CC 4.0 BY-SA版權(quán)協(xié)議,轉(zhuǎn)載請(qǐng)附上原文出處鏈接及本聲明。
原文鏈接:https://blog.csdn.net/weixin_43637490/article/details/118468559
感謝PeanutbutterBoh提供的源文件與過(guò)程,侵必刪,本文僅供自用記錄,不做盈利,以下為原文較長(zhǎng),個(gè)人實(shí)操建議直接跳到下邊

連續(xù)投影算法(Successive Projections Algorithm,SPA)是一種使矢量空間共線性最小化的前向變量選擇算法, 它的優(yōu)勢(shì)在于提取全波段的幾個(gè)特征波長(zhǎng),能夠消除原始光譜矩陣中冗余的信息,可用于光譜特征波長(zhǎng)的篩選。
——百度百科
論文寫作需要用到SPA對(duì)高光譜數(shù)據(jù)進(jìn)行波段選擇,在網(wǎng)上找到相關(guān)代碼SPA_GUI? http://www.ele.ita.br/~kawakami/spa/
后不知道如何使用,于是參考幫助文檔一步一步的做。以下內(nèi)容來(lái)自幫助文檔內(nèi)第七章節(jié)示例。
鏈接:https://pan.baidu.com/s/1yoWpCvIneq5cfRxfZG8P5w
提取碼:ylhl
SPA工具、幫助文檔和示例數(shù)據(jù)都放在wangpan里了

前情提要
網(wǎng)上看的問題大多出在不知道Xcal、Ycal和Xval、Yval是什么東西,根據(jù)我的理解,Xcal、Ycal 分別是建模集的光譜反射率矩陣和待測(cè)物質(zhì)含量(比如植株含水量、葉面積指數(shù)、生物量等等這些生理參數(shù),我這里是拿農(nóng)業(yè)方面做例子,還可以是其他學(xué)科的一些參數(shù));Xval、Yval 分別是驗(yàn)證集或者叫預(yù)測(cè)集光譜反射率矩陣和待測(cè)物質(zhì)含量。就是說(shuō),同一批光譜數(shù)據(jù)分析之前先要?jiǎng)澐謽颖炯?,將所有樣本劃分為建模集和?yàn)證集。而且這些矩陣是行為樣本,列為波段,也就是說(shuō)第一行是樣本1,這一行從第一列開始往后都是這個(gè)樣本的反射率。
算法實(shí)現(xiàn)
1、數(shù)據(jù)設(shè)置
數(shù)據(jù)使用示例數(shù)據(jù),獲取數(shù)據(jù)代碼如下(也是從網(wǎng)站http://www.ele.ita.br/~kawakami/spa/Extract_corn_dataset.m直接獲取的):
原始數(shù)據(jù)放在wangpan里:鏈接:https://pan.baidu.com/s/1vdNrXhKgQlfGJM70jENOYQ
提取碼:bz2t
直接運(yùn)行,運(yùn)行時(shí)間很長(zhǎng)慢慢等待就好,運(yùn)行完畢后會(huì)出現(xiàn)corn_dataset.mat這個(gè)文件,打開可以看到,是一個(gè)80×701的一個(gè)矩陣,正如前情提要里說(shuō)到,行為樣本列為波段。根據(jù)幫助文檔里所寫,這個(gè)數(shù)據(jù)是80個(gè)玉米樣本反射率數(shù)據(jù)1到700列,而且第701列數(shù)據(jù)為玉米含水量(The data set used in this example consists of spectra from 80 corn samples, which were acquired in the range 1100–1498 nm, together with moisture content for each sample.光譜范圍是1100到1498但是為啥會(huì)有700個(gè)波段可能是幫助文檔寫錯(cuò)了吧)。

2、數(shù)據(jù)導(dǎo)入Loading the data
運(yùn)行SPA_GUI.p文件,會(huì)出現(xiàn)下列界面,然后選擇Load,把那個(gè)玉米數(shù)據(jù)集導(dǎo)入。
In the “Load Data” module (section 3) use the “l(fā)oad” button to load the contents of data file tothe workspace.Once the data file is loaded, the matrix “data” is presented at the “Data matrices in the workspace” group.

然后在Objects里選擇Matrix Columns,意思是按列選擇。然后點(diǎn)擊Edit,輸入1:700,點(diǎn)擊確定就會(huì)選擇前700列數(shù)據(jù),點(diǎn)擊Extract提取當(dāng)作X(后續(xù)還要對(duì)X進(jìn)行處理,處理得到Xcal、Xval)。
The next step is to split the data matrix will in the matrices X (instrumental responses) and Y (parameter of interesting). In order to do that, first select the “Matrix Columns” checkbox and press the “Edit” button in the “Objects” group. The following window will appear. In this window, select the columns with index of 1 to 700 and press the OK button.To extract the data to Y, repeat the process selecting only the column 701.



這個(gè)時(shí)候Data matrices in the workspace里就會(huì)出現(xiàn)提取的X。同樣方法提取第701列數(shù)據(jù)玉米含水量當(dāng)作Y(后續(xù)還要對(duì)Y進(jìn)行處理,處理得到Y(jié)cal、Yval)。

3、數(shù)據(jù)預(yù)處理Preprocessing
這個(gè)示例使用了Savitzky–Golay一階平滑消除光譜噪聲,降低環(huán)境背景干擾等因素的影響。當(dāng)然程序還提供了wavelet denoising小波去噪的處理方法。
This module contains the following groups: “Savitzky-Golay Smoothing”, “Savitzky-Golay Differentiation”, and “Wavelet Denoising”.
首先要先全部選擇X中的數(shù)據(jù)。Objects里選擇Matrix Rows,然后Select All。
To perform this preprocessing, select the matrix “X” in the “Data matrices in the workspace” group. Then, in the “Objects” group, select the “Matrix rows” option and press the button “Select All” to select all samples.

隨后切換到Data Pre-Processing界面,設(shè)置S-G平滑的相關(guān)參數(shù)。
Switch to the “Data Pre-Processing” module. In this module, inform the Savitzky-Golay filter parameters (the frame size, polynomial order, and differentiation order), according to the figure below.
Frame Size一定是奇數(shù),值越大,則平滑效果越明顯。
Polynomnial設(shè)置平滑多項(xiàng)式的次數(shù)。通常設(shè)置為2~4。較低的次數(shù)能夠產(chǎn)生平滑結(jié)果,但是有可能出現(xiàn)偏置。較高的次數(shù)能降低偏置,但有可能過(guò)擬合而導(dǎo)致結(jié)果噪聲過(guò)多。次數(shù)必須小于濾波器寬度,即Frame Size。
Order設(shè)置導(dǎo)數(shù)階數(shù)。設(shè)置為0,表示僅平滑;設(shè)置為1,表示一階導(dǎo)數(shù)平滑結(jié)果;設(shè)置為2,表示二階導(dǎo)數(shù)。以此類推(Order必須小于等于Polynomial)。
The frame size must be odd, and the polynomial order must be less than the frame length. If invalid parameters are entered, error messages will appear.
The same parameters specified for Savitzky-Golay smoothing will be also used for Savitzky- Golay differentiation. To run the differentiation, it is also necessary to specify the differentiation order (1 or 2, meaning first or second derivative).

然后Apply即可,會(huì)出現(xiàn)S-G平滑后的結(jié)果。

可以用+ -來(lái)修改S-G平滑的參數(shù)。
If want to test other preprocessing configurations, press the “+” and “-” buttons to change the frame size and polynomial order.
隨后點(diǎn)擊Save Signal來(lái)保存S-G平滑結(jié)果,結(jié)果保存為Xnew 。
Press the “Save signal” button in the Savitzky-Golay screen to save the processed samples. A window requesting the name of the matrix will appear. In this window, inform the name of the matrix as “Xnew”.
4、樣本選擇Selection of samples
這個(gè)示例使用了Kennard-Stone算法來(lái)選擇樣本。當(dāng)然程序還提供了隨機(jī)選擇Random Sampling和SPXY算法。
把這80個(gè)樣本分成40建模20驗(yàn)證20預(yù)測(cè),切換到Sample Selection界面,參數(shù)設(shè)置如下圖所示。
The KS algorithm is used to divide the available samples into calibration, validation, and prediction sets. The corn data were divided into 40 samples for calibration, 20 samples for validation, and 20 samples for prediction. These sets are used for model-building and performance evaluation.
To select the 40 samples for calibration, switch to the “Sample Selection” module and set the parameters as in the following figure.

點(diǎn)擊run,便會(huì)對(duì)X和Y數(shù)據(jù)進(jìn)行選擇。

可以看到,Xnew_ks_sel包含被選擇作為建模的樣本,也就是Xcal,而Xnew_ks_notsel中是未被選中建模的樣本,這個(gè)矩陣中包含驗(yàn)證和預(yù)測(cè)的樣本所以要進(jìn)一步拆分。Y_ks_sel就是Ycal
The “Xnew_ks_sel” matrix, which contains the selected samples, is the set to be used for calibration. The “Xnew_ks_notsel’ matrix, which contains the samples that were not selected, will be divided in two sets, for validation and prediction.
Before dividing the “Xnew_ks_notsel” matrix, it will first be ordered according with Euclidian distances using the KS algorithm. This procedure can be performed by setting the parameters as in the following figure.
對(duì)Xnew_ks_notsel進(jìn)一步拆分,參數(shù)設(shè)置如下。(根據(jù)自己的需要來(lái)設(shè)置,如果需要預(yù)測(cè)集就進(jìn)行拆分,如果不需要預(yù)測(cè)集直接開始運(yùn)行SPA了,Xnew_ks_notsel就是Xval。)


雖然設(shè)置的還是40,而且Xnew_ks_notsel_ks_sel與Xnew_ks_notsel所含內(nèi)容是一致的,但是新生成的Xnew_ks_notsel_ks_sel是按照歐氏距離進(jìn)行排序的。為了對(duì)其進(jìn)行選擇,切換到Load Data界面,在Data matrices in the workspace界面中選擇Xnew_ks_notsel_ks_sel。
The new matrix “Xnew_ks_notsel_ks_sel” contains the same samples of “Xnew_ks_notsel”, but ordered by distance. In order to split this matrix in the validation and prediction sets, switch to the “Load Data” module and select the “Xnew_ks_notsel_ks_sel” matrix in the “Data matrices in the workspace” group.
然后點(diǎn)擊Edit,參數(shù)設(shè)置為1:2:40,意思是以2為間隔對(duì)數(shù)據(jù)進(jìn)行選擇。
Then, press the “Edit” button in the “Objects” group and select the samples 1:2:40, as illustrated in the figure below.

然后將其提取,點(diǎn)擊Extract作為Xval。
In order to extract the selected samples, press the “Extract” button in the “Data matrices in the workspace” group. The following window will appear. In this window, inform the name of the new matrix as “Xval” and press the OK button.

After that, press the “Export Selection” button in the “Objects” group to export the selected indices (for future use). The following window will appear. Inform the name of the array as “idx_validation”.
然后開始提取Yval。選擇Y_ks_notsel_ks_sel,可以按照提取Xval的方法(就是設(shè)置1:2:40),也可以使用剛才導(dǎo)出的idx_validation。點(diǎn)擊Select from array,并選擇idx_validation,這樣就會(huì)保證Xval與Yval是對(duì)應(yīng)的(也就是一組反射率對(duì)應(yīng)一個(gè)生理參數(shù))。
Now, select the “Y_ks_notsel_ks_sel” matrix in the “Data matrices in the workspace” group to extract the moisture content for the validation samples.
The same indices used for matrix Xval must be used for matrix Yval. These indices can be informed by using the same procedure used above. An alternative procedure is to press the “Select from array” button in the “Objects” group. After pressing this button, a list of numeric matrices is presented. In this list, select the “idx_validation” array that was saved before. This will ensure that the same indices are used for Xval and Yval matrix.

選擇預(yù)測(cè)數(shù)據(jù)集。先選擇Xnew_ks_notsel_ks_sel,然后選擇Select from array,再選idx_validation,再點(diǎn)擊Invert就會(huì)反向選擇,意思就是把Xnew_ks_notsel_ks_sel剩下的數(shù)據(jù)全選上。
To choose the samples of prediction, select the matrix Xnew_ks_notsel_ks_sel again in the “Data matrices in the workspace” group. Use the “Select from array” button to load again the list of indices available in the “idx_validation” array.
Press the “Invert” button in the “Objects” group to invert the selection, i.e., to select the remaining samples.

然后Extract,提取為Xpred。
Now, set the matrix X for prediction using the “Extract” button in the “Data matrices in the workspace” group.

提取Ypred是同樣的方法。
Select the matrix Y_ks_notsel_ks_sel in the “Data matrices in the workspace” group to extract the moisture content for the prediction samples.
Now, set the matrix Y for prediction using the “Extract” button in the “Data matrices in the workspace” group.

5、SPA算法運(yùn)行Variable selection for multivariate calibration using SPA
5.1 建模
切換到SPA界面,把之前提取的Xcal等等參數(shù)設(shè)置一下,再設(shè)置篩選波段數(shù)量最大最小值即可。下圖的參數(shù)都是按照幫助文檔里的參數(shù)設(shè)置的,可以根據(jù)自己需要進(jìn)行設(shè)置最大最小值。波段選擇數(shù)據(jù)存儲(chǔ)為var_sel。
To use the SPA algorithm, set the calibration and validation matrices, minimum and maximum number of variables as described in the section 6.1. In this example, specify the parameters as in the following figure and press the “Run SPA” button.
這個(gè)m_max一定小于樣本數(shù)-1,但是也不一定非得設(shè)置為樣本數(shù)-1,可以適當(dāng)小一點(diǎn)這樣才能選的多。有時(shí)候設(shè)置的大了SPA才選了1個(gè)波段出來(lái)。

run之后會(huì)出現(xiàn)兩個(gè)圖,左邊是誤差分析圖,右邊是波段選擇圖??梢钥吹皆诘?7次迭代時(shí)RMSE達(dá)到最小,于是選擇了17個(gè)波段。波段選擇的數(shù)據(jù)可以打開var_sel查看。
Two figures are presented: scree plot and the variables selected.
The scree plot tends to level off after a certain number of variables is added to the model. The number of variables selected in the third phase of SPA is indicated by square marker. This is the point at which the RMSE is not significantly larger than RMSEmin according to an F-test with a = 0.25.
The variables selected by SPA are plotted at the first calibration samples. This figure is presented below.


如果想保存RMSE這個(gè)圖的數(shù)據(jù),在RMSE這個(gè)figure里,點(diǎn)擊刷亮/選擇數(shù)據(jù),然后全選數(shù)據(jù),右鍵選擇創(chuàng)建變量,保存RMSE數(shù)據(jù)即可。

5.2 樣本預(yù)測(cè)
一般到SPA算法運(yùn)行部分就可以了,樣本預(yù)測(cè)這個(gè)部分我的論文里沒有涉及,所以直接上幫助文檔。
In this example, specify the “Prediction” group parameters as in the following figure.
The graph reference versus predicted is presented together with the statistics parameters PRESS, RMSEP, SDV, and r, for the prediction set. The figure below shows the obtained results.

To know the statistics parameter of the validation set, use the validation matrices in the spaces of prediction (Xpred and ypred). If they are left blank, leave-one-out cross-validation will be carried out in the calibration samples.
意思是如果Xpred和Ypred沒有設(shè)置的話,會(huì)使用留一交叉驗(yàn)證法進(jìn)行預(yù)測(cè)。

做完之后保存數(shù)據(jù)。
Switch to the “Load Data” module and press either the “Save” or the “Save As” button in the “Data File” group to save the data matrices to the data file (file with .mat extension).
In the main menu, choose the option “File: Save” or “File: Save As” to save the configuration file (file with .spr extension used to store the parameters specified in the graphical user interface).

終于到實(shí)操了,首先是下載包

PDF英文版的,沒什么看的必要,按操作來(lái)就行了,然后就是matlab添加文件路徑,將該程序?qū)脒M(jìn)去。然后雙擊或右鍵運(yùn)行。但這時(shí)會(huì)出現(xiàn)一個(gè)很嚴(yán)重的問題,由于版本或分辨率問題,導(dǎo)致SPA_GUI程序顯示不完全,只有一半綠色,怎么調(diào)也調(diào)不過(guò)來(lái)。CSDN評(píng)論區(qū)有人指出將分辨率調(diào)整為1280*1024可以完整顯示,并且在之后可以更改為原本分辨率。但是嘗試過(guò)后我的主機(jī)依舊無(wú)法顯示,直到換為筆記本電腦即可以。我懷疑這有玄學(xué)存在。

換成筆記本后,想清楚自己的所想要達(dá)到的目的,我只需要進(jìn)行波段篩選即可,無(wú)所謂建模驗(yàn)證預(yù)測(cè),因此直接新建兩個(gè)參數(shù)REFLE、PARAMETER(隨意命名),然后按照X(各波段數(shù)值)*Y(樣本),X(1)*Y(樣本)進(jìn)行復(fù)制導(dǎo)入。然后進(jìn)行程序的讀入,點(diǎn)一下Update即可刷出

然后點(diǎn)擊SPA按鈕,在Xcal和ycal兩個(gè)選項(xiàng)中分別選中REFLE和PARAMETER
并在m_min和m_max兩個(gè)選項(xiàng)中輸入1與波段數(shù)量n-1兩個(gè)值,具體解釋翻上邊,我是沒碰見將值選擇過(guò)大導(dǎo)致選擇波段只有一個(gè)的情況,并且m_max的值的變化不是很影響最終結(jié)果。?此設(shè)置僅供參考,自己多試試就知道了、

然后單擊Run SPA,程序彈出窗口進(jìn)行運(yùn)行

運(yùn)行完成

兩張圖片重疊在了一起,拖拽一下即可查看
首先是Fig1 代表著隨著選擇特征波段數(shù)量的增加RMSE的變化趨勢(shì)

這里邊需要注意的點(diǎn)就是最終選擇結(jié)果并不是以RMSE最低的位置進(jìn)行選定。
原文文檔中這樣解釋
The scree plot tends to level off after a certain number of variables is added to the model. The number of variables selected in the third phase of SPA is indicated by square marker. This is the point at which the RMSE is not significantly larger than RMSEmin according to an F-test with a = 0.25.
大意就是最終選擇的特征點(diǎn)數(shù)經(jīng)過(guò)F檢驗(yàn)選取的是不顯著大于RMSE最小值的點(diǎn)
然后選中

全部拖中



自己保存即可
Fig2的內(nèi)容看情況選取,大致就是自己的波段選擇的是哪幾個(gè)以及其波段值。感覺沒什么用只是個(gè)示意圖,可以直接叉掉。我的是一階導(dǎo)數(shù),不要介意我的數(shù)值大小

打開程序輸出的var_sel參數(shù),將其輸出結(jié)果保存

自己排序選擇即可,這玩意兒就是個(gè)序號(hào)