散文網(wǎng) » 生活 »日常 » R語言使用邏輯回歸Logistic、單因素方差分析anova、異常點分析和可視化分類iris鳶尾花

R語言使用邏輯回歸Logistic、單因素方差分析anova、異常點分析和可視化分類iris鳶尾花

2022-07-30 21:05 作者:拓端tecdat 0人讀過 | 我要投稿

全文鏈接：http://tecdat.cn/?p=27650?

原文出處：拓端數(shù)據(jù)部落公眾號

摘要

本文將探討 Fisher 和 Anderson?鳶尾花數(shù)據(jù)集中呈現(xiàn)的三個變量之間的關(guān)系，特別是virginica 和 versicolor 級別的因變量變量物種對預測變量花瓣長度和花瓣寬度的邏輯回歸。單因素方差分析和數(shù)據(jù)可視化都確定了因變量的一個因素水平，即 I. setosa，很容易與其他兩個因素線性分離，具有非常明顯的均值和方差，因此不是我們對邏輯回歸感興趣。

介紹

對鳶尾花數(shù)據(jù)的初步查看引發(fā)了關(guān)于數(shù)據(jù)集本身性質(zhì)的直接問題：為什么要收集如此簡單的數(shù)據(jù)，事實上，我們最初的直覺之一是想知道，鑒于數(shù)據(jù)集中的信息，是否有可能在進行相關(guān)分析和診斷后，建立一個能夠?qū)π掠^察結(jié)果進行分類的模型。??

我們很驚訝也很高興得知數(shù)據(jù)集通常是為了這個目的分析的。它最常見的用途是機器學習，特別是分類和模式識別應(yīng)用。我們開始使用到目前為止所學的工具檢查部分數(shù)據(jù)——即，我們將使用邏輯回歸和兩種鳶尾花，Virginica 和 versicolor（分別表示為π?=0 和π=1)。第三種物種 I. setosa 被排除在外，因為它在所有維度上都與其他兩個物種高度分離。

方法

在這種情況下，邏輯回歸比卡方或 Fisher 精確檢驗更合適，因為我們有一個二元因變量和多個預測變量，它還允許我們在控制其他變量的同時清楚地量化各種影響的強度（即每個參數(shù)的優(yōu)勢比）。

plot(predicresiduals(logit.fylab="
rl=lm(resi.fit)~bs(predict(.fit),8))
#rl=loess(repredictit.fit))
y=pree=TRUE)
segments(predict(l

??

結(jié)果

創(chuàng)建了一個邏輯模型，一般模型和參數(shù)特征如下：

通過查看它們的優(yōu)勢比，可以有效地總結(jié)參數(shù)估計的含義。顯然，截距項并不是特別有趣，因為數(shù)據(jù)點 (0,0) 在理論上是不可能的，而且遠遠超出了我們收集的數(shù)據(jù)范圍。β1的優(yōu)勢比

和β2

?更有趣；它們分別代表相關(guān)變量的每一個增量，而另一個保持不變時，特定植物屬于 I. virginica 物種的幾率增加。在這種情況下，很明顯，增加花瓣寬度會對特定植物被歸類為 I. virginica 的幾率產(chǎn)生巨大影響——這種影響大約是花瓣長度的 110 倍。然而，優(yōu)勢比 95% 置信區(qū)間都不包含 1，因此我們可以得出結(jié)論，這兩種效應(yīng)都具有統(tǒng)計學意義。

library(ggplot2)
#繪圖數(shù)據(jù)
qplot(Petal.Width, Petal.Length, colour = Species, ?data = irises, main = "Iris classification")

使用模型中的系數(shù)估計，我們可以確定一個標準——一個線性判別式——通過它我們可以最好地分離數(shù)據(jù)。線性判別式的準確度在以下混淆矩陣中給出：

# 從模型中獲得預測結(jié)果
logit.predictions <- ifelse(predict(logit.fit) > 0,'virginica', 'versicolor')
# 混淆矩陣
table(irises[,5],logit.predictions)

診斷

通過檢查殘差和數(shù)據(jù)的影響，我們確定了幾個潛在的異常觀察結(jié)果：

在所有可能有問題的觀察中，我們注意到第 57 個觀察樣本可能是異常點。檢查診斷圖，我們看到邏輯回歸的趨勢特征，包括殘差與預測圖中的兩條不同曲線。第 57 個觀察樣本出現(xiàn)在每個診斷圖中，但幸運的是沒有超過庫克的距離。

結(jié)論與討論

在這種情況下，邏輯模型的使用具有啟發(fā)性，因為它顯示了根據(jù)多個預測變量將數(shù)據(jù)分類為二元因變量技術(shù)的強大功能。該模型可預見地顯示出最大的不確定性，即在給定維度（即一個物種的數(shù)據(jù)與另一個物種的數(shù)據(jù)之間的邊界）中觀測值接近平均值時?？紤]模型是否可以改進，或者不同的模型是否更適合數(shù)據(jù)是很有趣的；也許對于這個分類問題，k-最近鄰方法是必要的。無論如何，6% 的錯誤分類率實際上是相當不錯的；更多的數(shù)據(jù)肯定會提高這個數(shù)字。

自測題

Diagnosis of Depression in Primary Care
To study factors related to the diagnosis of depression in primary care, 400 patients were randomly selected and the following variables were recorded:
DAV: Diagnosis of depression in any visit during one year of care.
0 = Not diagnosed
1 = Diagnosed
PCS: Physical component of SF-36 measuring health status of the patient.
MCS: Mental component of SF-36 measuring health status of the patient
BECK: The Beck depression score.
PGEND: Patient gender
0 = Female
1 = Male
AGE: Patient’s age in years.
EDUCAT: Number of years of formal schooling.
The response variable is DAV (0 not diagnosed, 1 diagnosed), and it is recorded in the first column of the data. The data are stored in the file final.dat and is available from the course web site. Perform a multiple logistic regression analysis of this data using SAS or any other statistical packages. This includes
estimation, hypothesis testing, model selection, residual analysis and diagnostics. Explain your findings in a 3 to 4- page report. Your report may include the following sections:
? Introduction: Statement of the problem.
? Material and Methods: Description of the data and methods that you used for the analysis.
? Results: Explain the results of your analysis in detail. You may cut and paste some of your computer
outputs and refer to them in the explanation of your results.
? Conclusion and Discussion: Highlight the main findings and discuss.
Please cut and paste the computer outputs to your report and do not include any direct computer output as an attachment
Please note that you have also the option of using a similar data set in your own field of interest.

最受歡迎的見解

1.R語言多元Logistic邏輯回歸應(yīng)用案例

2.面板平滑轉(zhuǎn)移回歸(PSTR)分析案例實現(xiàn)

3.matlab中的偏最小二乘回歸（PLSR）和主成分回歸（PCR）

4.R語言泊松Poisson回歸模型分析案例

5.R語言回歸中的Hosmer-Lemeshow擬合優(yōu)度檢驗

6.r語言中對LASSO回歸，Ridge嶺回歸和Elastic Net模型實現(xiàn)

7.在R語言中實現(xiàn)Logistic邏輯回歸

8.python用線性回歸預測股票價格

9.R語言如何在生存分析與Cox回歸中計算IDI，NRI指標

標簽：