散文網(wǎng) » 生活 »日常 » R語(yǔ)言曲線回歸：多項(xiàng)式回歸、多項(xiàng)式樣條回歸、非線性回歸數(shù)據(jù)分析

R語(yǔ)言曲線回歸：多項(xiàng)式回歸、多項(xiàng)式樣條回歸、非線性回歸數(shù)據(jù)分析

2021-03-01 23:50 作者:拓端tecdat 0人讀過(guò) | 我要投稿

?原文鏈接：http://tecdat.cn/?p=9508

?

本文將使用三種方法使模型適合曲線數(shù)據(jù)：1）多項(xiàng)式回歸；2）用多項(xiàng)式樣條進(jìn)行B樣條回歸；3）進(jìn)行非線性回歸。在此示例中，這三個(gè)中的每一個(gè)都將找到基本相同的最佳擬合曲線。

?

多項(xiàng)式回歸

多項(xiàng)式回歸實(shí)際上只是多元回歸的一種特殊情況。

?

對(duì)于線性模型（lm），調(diào)整后的R平方包含在summary（model）語(yǔ)句的輸出中。AIC是通過(guò)其自己的函數(shù)調(diào)用AIC（model）生成的。使用將方差分析函數(shù)應(yīng)用于兩個(gè)模型進(jìn)行額外的平方和檢驗(yàn)。?

?

對(duì)于AIC，越小越好。對(duì)于調(diào)整后的R平方，越大越好。將模型a與模型b進(jìn)行比較的額外平方和檢驗(yàn)的非顯著p值表明，帶有額外項(xiàng)的模型與縮小模型相比，并未顯著減少平方誤差和。也就是說(shuō)，p值不顯著表明帶有附加項(xiàng)的模型并不比簡(jiǎn)化模型好。

?

Data = read.table(textConnection(Input),header=TRUE)
### Change Length from integer to numeric variable
### ? otherwise, we will get an integer overflow error on big numbers
Data$Length = as.numeric(Data$Length)
### Create quadratic, cubic, quartic variables
library(dplyr)
Data =
mutate(Data,
Length2 = Length*Length,
Length3 = Length*Length*Length,
Length4 = Length*Length*Length*Length)
library(FSA)
headtail(Data)
Length Clutch Length2 ?Length3 ? ? Length4
1 ? ? 284 ? ? ?3 ? 80656 22906304 ?6505390336
2 ? ? 290 ? ? ?2 ? 84100 24389000 ?7072810000
3 ? ? 290 ? ? ?7 ? 84100 24389000 ?7072810000
16 ? ?323 ? ? 13 ?104329 33698267 10884540241
17 ? ?334 ? ? ?2 ?111556 37259704 12444741136
18 ? ?334 ? ? ?8 ?111556 37259704 12444741136

?

定義要比較的模型

?

model.1 = lm (Clutch ~ Length, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? data=Data)
model.2 = lm (Clutch ~ Length + Length2, ? ? ? ? ? ? ? ? ? ? data=Data)
model.3 = lm (Clutch ~ Length + Length2 + Length3, ? ? ? ? ? data=Data)
model.4 = lm (Clutch ~ Length + Length2 + Length3 + Length4, data=Data)

?

生成這些模型的模型選擇標(biāo)準(zhǔn)統(tǒng)計(jì)信息

?

summary(model.1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) ?-0.4353 ? ?17.3499 ? -0.03 ? ? 0.98
Length ? ? ? ?0.0276 ? ? 0.0563 ? ?0.49 ? ? 0.63
Multiple R-squared: ?0.0148, ?Adjusted R-squared: ?-0.0468
F-statistic: 0.24 on 1 and 16 DF, ?p-value: 0.631
AIC(model.1)
[1] 99.133
summary(model.2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.00e+02 ? 2.70e+02 ? -3.33 ? 0.0046 **
Length ? ? ? 5.86e+00 ? 1.75e+00 ? ?3.35 ? 0.0044 **
Length2 ? ? -9.42e-03 ? 2.83e-03 ? -3.33 ? 0.0045 **
Multiple R-squared: ?0.434, ? Adjusted R-squared: ?0.358
F-statistic: 5.75 on 2 and 15 DF, ?p-value: 0.014
AIC(model.2)
[1] 91.16157
anova(model.1, model.2)
Analysis of Variance Table
Res.Df ? ?RSS Df Sum of Sq ? ? ?F ?Pr(>F)
1 ? ? 16 186.15
2 ? ? 15 106.97 ?1 ? ?79.178 11.102 0.00455 **

?

?其余模型繼續(xù)此過(guò)程

?

四個(gè)多項(xiàng)式模型的模型選擇標(biāo)準(zhǔn)。模型2的AIC最低，表明對(duì)于這些數(shù)據(jù)，它是此列表中的最佳模型。同樣，模型2顯示了最大的調(diào)整后R平方。最后，額外的SS測(cè)試顯示模型2優(yōu)于模型1，但模型3并不優(yōu)于模型2。所有這些證據(jù)表明選擇了模型2。

模型

?

AIC

?

調(diào)整后的R平方

?

p值

1

?

99.1

?

-0.047

?

2

?

91.2

?

?? 0.36

?

0.0045

3

?

92.7

?

?? 0.33

?

0.55

4

?

94.4

?

?? 0.29

?

0.64

?

對(duì)比與方差分析

AIC，AICc或BIC中的任何一個(gè)都可以最小化以選擇最佳模型。

?

$Fit.criteria
Rank Df.res ? AIC ? AICc ? ?BIC R.squared Adj.R.sq p.value Shapiro.W Shapiro.p
1 ? ?2 ? ? 16 99.13 100.80 101.80 ? 0.01478 ?-0.0468 0.63080 ? ?0.9559 ? ?0.5253
2 ? ?3 ? ? 15 91.16 ?94.24 ?94.72 ? 0.43380 ? 0.3583 0.01403 ? ?0.9605 ? ?0.6116
3 ? ?4 ? ? 14 92.68 ?97.68 ?97.14 ? 0.44860 ? 0.3305 0.03496 ? ?0.9762 ? ?0.9025
4 ? ?5 ? ? 13 94.37 102.00 ?99.71 ? 0.45810 ? 0.2914 0.07413 ? ?0.9797 ? ?0.9474
Res.Df ? ?RSS Df Sum of Sq ? ? ? F ? Pr(>F)
1 ? ? 16 186.15
2 ? ? 15 106.97 ?1 ? ?79.178 10.0535 0.007372 ** ?## Compares m.2 to m.1
3 ? ? 14 104.18 ?1 ? ? 2.797 ?0.3551 0.561448 ? ? ## Compares m.3 to m.2
4 ? ? 13 102.38 ?1 ? ? 1.792 ?0.2276 0.641254 ? ? ## Compares m.4 to m.3

?

研究最終模型

?

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.00e+02 ? 2.70e+02 ? -3.33 ? 0.0046 **
Length ? ? ? 5.86e+00 ? 1.75e+00 ? ?3.35 ? 0.0044 **
Length2 ? ? -9.42e-03 ? 2.83e-03 ? -3.33 ? 0.0045 **
Multiple R-squared: ?0.434, ? Adjusted R-squared: ?0.358
F-statistic: 5.75 on 2 and 15 DF, ?p-value: 0.014
Anova Table (Type II tests)
Response: Clutch
Sum Sq Df F value Pr(>F)
Length ? ? ?79.9 ?1 ? ?11.2 0.0044 **
Length2 ? ? 79.2 ?1 ? ?11.1 0.0045 **
Residuals ?107.0 15

?

模型的簡(jiǎn)單圖解

?

?

檢查模型的假設(shè)

?

?

線性模型中殘差的直方圖。這些殘差的分布應(yīng)近似正態(tài)。

?

?

殘差與預(yù)測(cè)值的關(guān)系圖。殘差應(yīng)無(wú)偏且均等。?

?

###通過(guò)以下方式檢查其他模型：

?

具有多項(xiàng)式樣條的B樣條回歸

B樣條回歸使用線性或多項(xiàng)式回歸的較小部分。它不假設(shè)變量之間存在線性關(guān)系，但是殘差仍應(yīng)是獨(dú)立的。該模型可能會(huì)受到異常值的影響。

?

### --------------------------------------------------------------
### B-spline regression, turtle carapace example
### --------------------------------------------------------------
summary(model) ? ? ? ? ? ? ? ? ? ? ? ? # Display p-value and R-squared
Residual standard error: 2.671 on 15 degrees of freedom
Multiple R-squared: ?0.4338, ?Adjusted R-squared: ?0.3583
F-statistic: 5.747 on 2 and 15 DF, ?p-value: 0.01403

?

模型的簡(jiǎn)單圖解

?

?

檢查模型的假設(shè)

?

線性模型中殘差的直方圖。這些殘差的分布應(yīng)近似正態(tài)。

?

?

殘差與預(yù)測(cè)值的關(guān)系圖。殘差應(yīng)無(wú)偏且均等。?

?

??

?

非線性回歸

非線性回歸可以將各種非線性模型擬合到數(shù)據(jù)集。這些模型可能包括指數(shù)模型，對(duì)數(shù)模型，衰減曲線或增長(zhǎng)曲線。通過(guò)迭代過(guò)程，直到一定的收斂條件得到滿足先后找到更好的參數(shù)估計(jì)。

在此示例中，我們假設(shè)要對(duì)數(shù)據(jù)擬合拋物線。

數(shù)據(jù)中包含變量（Clutch和Length），以及我們要估計(jì)的參數(shù)（Lcenter，Cmax和a）。?

沒(méi)有選擇參數(shù)的初始估計(jì)的固定過(guò)程。通常，參數(shù)是有意義的。這里L(fēng)center?是頂點(diǎn)的x坐標(biāo)，Cmax是頂點(diǎn)的y坐標(biāo)。因此我們可以猜測(cè)出這些合理的值。?盡管我們知道參數(shù)a應(yīng)該是負(fù)的，因?yàn)閽佄锞€向下打開(kāi)。

因?yàn)閚ls使用基于參數(shù)初始估計(jì)的迭代過(guò)程，所以如果估計(jì)值相差太遠(yuǎn)，它將無(wú)法找到解決方案，它可能會(huì)返回一組不太適合數(shù)據(jù)的參數(shù)估計(jì)。繪制解決方案并確保其合理很重要。

如果您希望模型具有整體p值，并且模型具有偽R平方，則需要將模型與null模型進(jìn)行比較。從技術(shù)上講，要使其有效，必須將null模型嵌套在擬合模型中。這意味著null模型是擬合模型的特例。

?

對(duì)于沒(méi)有定義r平方的模型，已經(jīng)開(kāi)發(fā)了各種偽R平方值。

?

### --------------------------------------------------------------
### Nonlinear regression, turtle carapace example
### --------------------------------------------------------------
Data = read.table(textConnection(Input),header=TRUE)
Parameters:
Estimate Std. Error t value Pr(>|t|)
Lcenter 310.72865 ? ?2.37976 ?130.57 ?< 2e-16 ***
Cmax ? ? 10.05879 ? ?0.86359 ? 11.65 ?6.5e-09 ***
a ? ? ? ?-0.00942 ? ?0.00283 ? -3.33 ? 0.0045 **

?

確定總體p值和偽R平方

?

anova(model, model.null)
Res.Df Res.Sum Sq Df ?Sum Sq F value ?Pr(>F)
1 ? ? 15 ? ? 106.97
2 ? ? 17 ? ? 188.94 -2 -81.971 ? 5.747 0.01403 *
$Pseudo.R.squared.for.model.vs.null
Pseudo.R.squared
McFadden ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0.109631
Cox and Snell (ML) ? ? ? ? ? ? ? ? ? 0.433836
Nagelkerke (Cragg and Uhler) ? ? ? ? 0.436269

?

確定參數(shù)的置信區(qū)間

?

2.5 % ? ? ? ?97.5 %
Lcenter 305.6563154 315.800988774
Cmax ? ? ?8.2180886 ?11.899483768
a ? ? ? ?-0.0154538 ?-0.003395949
------
Bootstrap statistics
Estimate ?Std. error
Lcenter 311.07998936 2.872859816
Cmax ? ? 10.13306941 0.764154661
a ? ? ? ?-0.00938236 0.002599385
------
Median of bootstrap estimates and percentile confidence intervals
Median ? ? ? ? 2.5% ? ? ? ? 97.5%
Lcenter 310.770796703 306.78718266 316.153528168
Cmax ? ? 10.157560932 ? 8.58974408 ?11.583719723
a ? ? ? ?-0.009402318 ?-0.01432593 ?-0.004265714

?

模型的簡(jiǎn)單圖解

?

檢查模型的假設(shè)

?

線性模型中殘差的直方圖。這些殘差的分布應(yīng)近似正態(tài)。

?

plot(fitted(model),
residuals(model))

?

?

殘差與預(yù)測(cè)值的關(guān)系圖。殘差無(wú)偏且均等。?

?

標(biāo)簽：