R語言特征選擇方法——最佳子集回歸、逐步回歸|附代碼數(shù)據(jù)
原文鏈接:http://tecdat.cn/?p=5453
最近我們被客戶要求撰寫關(guān)于特征選擇方法的研究報(bào)告,包括一些圖形和統(tǒng)計(jì)輸出。
變量選擇方法
所有可能的回歸
model?<-?lm(mpg?~?disp?+?hp?+?wt?+?qsec,?data?=?mtcars)
ols_all_subset(model)##?#?A?tibble:?15?x?6##????Index?????N??????Predictors?`R-Square`?`Adj.?R-Square`?`Mallow's?Cp`##??????????????????????????????????????????##??1?????1?????1??????????????wt????0.75283?????????0.74459??????12.48094##??2?????2?????1????????????disp????0.71834?????????0.70895??????18.12961##??3?????3?????1??????????????hp????0.60244?????????0.58919??????37.11264##??4?????4?????1????????????qsec????0.17530?????????0.14781?????107.06962##??5?????5?????2???????????hp?wt????0.82679?????????0.81484???????2.36900##??6?????6?????2?????????wt?qsec????0.82642?????????0.81444???????2.42949##??7?????7?????2?????????disp?wt????0.78093?????????0.76582???????9.87910##??8?????8?????2?????????disp?hp????0.74824?????????0.73088??????15.23312##??9?????9?????2???????disp?qsec????0.72156?????????0.70236??????19.60281##?10????10?????2?????????hp?qsec????0.63688?????????0.61183??????33.47215##?11????11?????3??????hp?wt?qsec????0.83477?????????0.81706???????3.06167##?12????12?????3??????disp?hp?wt????0.82684?????????0.80828???????4.36070##?13????13?????3????disp?wt?qsec????0.82642?????????0.80782???????4.42934##?14????14?????3????disp?hp?qsec????0.75420?????????0.72786??????16.25779##?15????15?????4?disp?hp?wt?qsec????0.83514?????????0.81072???????5.00000
該plot
方法顯示了所有可能的回歸方法的擬合 ?。
model?<-?lm(mpg?~?disp?+?hp?+?wt?+?qsec,?data?=?mtcars)
k?<-?ols_all_subset(model)plot(k)


最佳子集回歸
選擇在滿足一些明確的客觀標(biāo)準(zhǔn)時(shí)做得最好的預(yù)測變量的子集,例如具有最大R2值或最小MSE, Cp或AIC。
model?<-?lm(mpg?~?disp?+?hp?+?wt?+?qsec,?data?=?mtcars)
ols_best_subset(model)##????Best?Subsets?Regression????##?------------------------------##?Model?Index????Predictors##?------------------------------##??????1?????????wt??????????????##??????2?????????hp?wt???????????##??????3?????????hp?wt?qsec??????##??????4?????????disp?hp?wt?qsec?##?------------------------------##?##???????????????????????????????????????????????????Subsets?Regression?Summary???????????????????????????????????????????????????##?-------------------------------------------------------------------------------------------------------------------------------##????????????????????????Adj.????????Pred?????????????????????????????????????????????????????????????????????????????????????????##?Model????R-Square????R-Square????R-Square?????C(p)????????AIC????????SBIC????????SBC????????MSEP??????FPE???????HSP???????APC??##?-------------------------------------------------------------------------------------------------------------------------------##???1????????0.7528??????0.7446??????0.7087????12.4809????166.0294????74.2916????170.4266????9.8972????9.8572????0.3199????0.2801?##???2????????0.8268??????0.8148??????0.7811?????2.3690????156.6523????66.5755????162.5153????7.4314????7.3563????0.2402????0.2091?##???3????????0.8348??????0.8171???????0.782?????3.0617????157.1426????67.7238????164.4713????7.6140????7.4756????0.2461????0.2124?##???4????????0.8351??????0.8107???????0.771?????5.0000????159.0696????70.0408????167.8640????8.1810????7.9497????0.2644????0.2259?##?-------------------------------------------------------------------------------------------------------------------------------##?AIC:?Akaike?Information?Criteria?##??SBIC:?Sawa's?Bayesian?Information?Criteria?##??SBC:?Schwarz?Bayesian?Criteria?##??MSEP:?Estimated?error?of?prediction,?assuming?multivariate?normality?##??FPE:?Final?Prediction?Error?##??HSP:?Hocking's?Sp?##??APC:?Amemiya?Prediction?Criteria
plot
model?<-?lm(mpg?~?disp?+?hp?+?wt?+?qsec,?data?=?mtcars)
k?<-?ols_best_subset(model)plot(k)



逐步前進(jìn)回歸
從一組候選預(yù)測變量中建立回歸模型,方法是逐步輸入基于p值的預(yù)測變量,直到?jīng)]有變量進(jìn)入變量。該模型應(yīng)該包括所有的候選預(yù)測變量。如果細(xì)節(jié)設(shè)置為TRUE
,則顯示每個(gè)步驟。
點(diǎn)擊標(biāo)題查閱往期內(nèi)容

R語言多元逐步回歸模型分析房價(jià)和葡萄酒價(jià)格:選擇最合適的預(yù)測變量

左右滑動(dòng)查看更多

01

02

03

04

變量選擇
#向前逐步回歸model?<-?lm(y?~?.,?data?=?surgical)
ols_step_forward(model)##?We?are?selecting?variables?based?on?p?value...##?1?variable(s)?added....##?1?variable(s)?added...##?1?variable(s)?added...##?1?variable(s)?added...##?1?variable(s)?added...##?No?more?variables?satisfy?the?condition?of?penter:?0.3##?Forward?Selection?Method???????????????????????????????????????????????????????##?##?Candidate?Terms:???????????????????????????????????????????????????????????????##?##?1?.?bcs????????????????????????????????????????????????????????????????????????##?2?.?pindex?????????????????????????????????????????????????????????????????????##?3?.?enzyme_test????????????????????????????????????????????????????????????????##?4?.?liver_test?????????????????????????????????????????????????????????????????##?5?.?age????????????????????????????????????????????????????????????????????????##?6?.?gender?????????????????????????????????????????????????????????????????????##?7?.?alc_mod????????????????????????????????????????????????????????????????????##?8?.?alc_heavy??????????????????????????????????????????????????????????????????##?##?------------------------------------------------------------------------------##???????????????????????????????Selection?Summary????????????????????????????????##?------------------------------------------------------------------------------##?????????Variable?????????????????????Adj.?????????????????????????????????????????##?Step??????Entered??????R-Square????R-Square?????C(p)????????AIC?????????RMSE??????##?------------------------------------------------------------------------------##????1????liver_test???????0.4545??????0.4440????62.5119????771.8753????296.2992????##????2????alc_heavy????????0.5667??????0.5498????41.3681????761.4394????266.6484????##????3????enzyme_test??????0.6590??????0.6385????24.3379????750.5089????238.9145????##????4????pindex???????????0.7501??????0.7297?????7.5373????735.7146????206.5835????##????5????bcs??????????????0.7809??????0.7581?????3.1925????730.6204????195.4544????##?------------------------------------------------------------------------------?
model?<-?lm(y?~?.,?data?=?surgical)
k?<-?ols_step_forward(model)##?We?are?selecting?variables?based?on?p?value...##?1?variable(s)?added....##?1?variable(s)?added...##?1?variable(s)?added...##?1?variable(s)?added...##?1?variable(s)?added...##?No?more?variables?satisfy?the?condition?of?penter:?0.3plot(k)

?


本文摘選?《?R語言特征選擇——逐步回歸?》?,點(diǎn)擊“閱讀原文”獲取全文完整資料。
點(diǎn)擊標(biāo)題查閱往期內(nèi)容
R語言多元逐步回歸模型分析房價(jià)和葡萄酒價(jià)格:選擇最合適的預(yù)測變量
R語言逐步多元回歸模型分析長鼻魚密度影響因素
R語言特征選擇——逐步回歸
r語言中對LASSO回歸,Ridge嶺回歸和彈性網(wǎng)絡(luò)Elastic Net模型實(shí)現(xiàn)
回歸分析與相關(guān)分析的區(qū)別和聯(lián)系
R語言分位數(shù)回歸預(yù)測篩選有上升潛力的股票
R語言實(shí)現(xiàn)LASSO回歸——自己編寫LASSO回歸算法
R語言泊松Poisson回歸模型預(yù)測人口死亡率和期望壽命
R語言時(shí)間序列TAR閾值自回歸模型
R語言用泊松Poisson回歸、GAM樣條曲線模型預(yù)測騎自行車者的數(shù)量
R語言分位數(shù)回歸Quantile Regression分析租房價(jià)格
R語言用Garch模型和回歸模型對股票價(jià)格分析
R語言廣義線性模型GLM、多項(xiàng)式回歸和廣義可加模型GAM預(yù)測泰坦尼克號幸存者
R語言分段回歸數(shù)據(jù)數(shù)據(jù)分析案例報(bào)告
R語言實(shí)現(xiàn)CNN(卷積神經(jīng)網(wǎng)絡(luò))模型進(jìn)行回歸數(shù)據(jù)分析
R語言分位數(shù)回歸、GAM樣條曲線、指數(shù)平滑和SARIMA對電力負(fù)荷時(shí)間序列預(yù)測