小白勿入,小果教你做lasso回歸分析
爾云間? 一個(gè)專門做科研的團(tuán)隊(duì)
我們?cè)谕ㄟ^(guò)Cox單因素分析,得到數(shù)量較多的基因與生存數(shù)據(jù)相關(guān),將全部的基因納入后續(xù)分析是不合適的,我們需要通過(guò)lasso回歸進(jìn)一步的篩選基因,用于后續(xù)的模型構(gòu)建,所以今天小果開(kāi)始著手進(jìn)行l(wèi)asso回歸,開(kāi)始敲代碼
?

代碼如下:?
?
remotes::install_github("rpkgs/Ipaper")#安裝Ipaper包
library("Ipaper")
library('glmnet')
?
?
data=read.table("training_logFC0.585.CSV",sep = ",",header = T,row.names = 1)
?
x_all <- subset(data, select = -c(OS, OS.time)) #基因表達(dá)矩陣
?
y_all <- subset(data, select = c(OS, OS.time))#生存數(shù)據(jù)
cvfit = cv.glmnet(as.matrix(x_all), Surv(y_all$OS.time,y_all$OS),nfold=10,family = "cox")
?
png(filename = "lasso1.png", height = 450, width = 600)
plot(cvfit, las =1)
dev.off()
pdf(file = "lasso1.pdf", height = 5)
plot(cvfit, las =1)
dev.off()
plot(cvfit)
fit <- glmnet(as.matrix(x_all), Surv(y_all$OS.time,y_all$OS),
????????????? family = "cox")
plot(fit, xvar = "norm", label = TRUE)
?
help("glmnet")
png(filename = "lasso2.png", height = 450, width = 600)
plot(fit, xvar = "lambda",label = TRUE, las=1)
?
dev.off()
pdf(file = "lasso2.pdf", height = 5)
plot(fit, xvar = "lambda",label = TRUE, las=1)
dev.off()
?
?
?
coef.min = coef(cvfit, s = "lambda.min")? ## lambda.min & lambda.1se 取一個(gè)
?
active.min = which(coef.min@i != 0) ## 找出那些回歸系數(shù)沒(méi)有被懲罰為0的
?
lasso_geneids <- coef.min@Dimnames[[1]][coef.min@i+1] ## 提取基因名稱
?
write(lasso_geneids, "lasso_genes.csv")
?
?結(jié)果展示
lasso1.pdf如下,兩條虛豎線間的基因就是我們篩選到的基因

Lasso2.pdf如下,我們關(guān)注的基因是隨著 -log lambda 的增加,coefficients值,最晚歸零的基因

lasso_genes.csv 如下

?

好了,以上就是今天的內(nèi)容,歡迎大家留言討論交流!