一文拿捏,泛癌之腫瘤與正常組織甲基化分析
泛癌分析,依然是比較火熱的話題,尤其結(jié)合多組學(xué)的方式來(lái)進(jìn)行研究,成為了熱點(diǎn)中的熱點(diǎn),跟著時(shí)代的潮流,今天小云想為大家分享一下如何進(jìn)行泛癌中腫瘤與正常組織甲基化分析,接下來(lái)我們開(kāi)始今天的學(xué)習(xí)。
1.?為何做泛癌中腫瘤與正常組織甲基化分析
啟動(dòng)子區(qū)DNA甲基化標(biāo)記在基因表達(dá)的轉(zhuǎn)錄調(diào)控中有著很重要的地位,參與了很多生物學(xué)過(guò)程,因此通過(guò)甲基化分析結(jié)合轉(zhuǎn)錄表達(dá),可以更好的來(lái)解釋生物學(xué)意義,了解相關(guān)的調(diào)控機(jī)制。
今天小云通過(guò)數(shù)據(jù)下載,提取感興趣基因集在腫瘤組織和正常組織中甲基化數(shù)據(jù),最后繪制好看的氣泡圖,展示出不同腫瘤組織中感興趣基因的甲基化狀態(tài),一文拿捏,泛癌甲基化分析,下面就跟著小云的節(jié)奏開(kāi)始今天的學(xué)習(xí)之旅吧!
2.?準(zhǔn)備需要的R包
#安裝需要的R包
BiocManager::install("ChAMP")
BiocManager::install("ChAMPdata")
BiocManager::install("ComplexHeatmap")
install.packages("ggpubr", "randomcoloR")
install.packages("ggplot2")
install.packages("data.table")
BiocManager::install("impute")
#加載需要的R包
library(ggplot2)
library(ChAMP)
library(ChAMPdata)
library(data.table)
library(randomcoloR)
library(ggpubr)
library(impute)
library(ComplexHeatmap)
3.?數(shù)據(jù)準(zhǔn)備
在進(jìn)行泛癌甲基化分析之前,小伙伴最關(guān)心的問(wèn)題是我們需要準(zhǔn)備哪些文件,以及這些文件該如何下載,不用擔(dān)心,小云幫你解決數(shù)據(jù)下載問(wèn)題,其實(shí)很簡(jiǎn)單,我們只需要泛癌樣本信息文件,甲基化探針注釋文件,泛癌甲基化文件,接下來(lái)小云給大家最一下展示奧!
#樣本信息文件來(lái)自該網(wǎng)站https://gdc.cancer.gov/about-data/publications/pancanatlas,可以直接下載。
merged_sample_quality_annotations.tsv,只需要關(guān)注前三列信息就可以,其他列可以忽略。

#甲基化探針注釋文件,來(lái)自ChAMP R包內(nèi)置數(shù)據(jù),記得需要加載一下就可以呀。
data("probe.features")

#泛癌甲基化文件,第一列為探針名,其他列為樣本名,需要泛癌甲基化文件的小伙伴可以聯(lián)系小云(ps:此處只列舉了一種腫瘤組織的甲基化文件)。

4.泛癌甲基化分析
在準(zhǔn)備好了數(shù)據(jù)后,小云帶著大家進(jìn)行今天的重頭戲-泛癌甲基化分析,主要包括提取感興趣基因集的啟動(dòng)子探針信息,通過(guò)提取的啟動(dòng)子探針信息,分別提取不同腫瘤組織中感興趣基因集的甲基化數(shù)據(jù)等步驟,通過(guò)該分析小伙伴可以掌握很多數(shù)據(jù)格式處理技巧和方法,值的仔細(xì)閱讀,小云強(qiáng)烈推薦哈。
# 獲得同時(shí)有腫瘤和正常樣本的腫瘤名
tumors <- c("BLCA","BRCA","CESC","CHOL","COAD",
????????????"ESCA","GBM","HNSC","KICH","KIRC",
????????????"KIRP","LIHC","LUAD","LUSC","PAAD",
????????????"PRAD","READ","STAD","THCA","UCEC")
# 獲得感興趣的基因集
frg<- c("CDKN1A","HSPA5","TTC35","SLC7A11","NFE2L2","MT1G","HSPB1","GPX4","FANCD2","CISD1","FDFT1","SLC1A5","SAT1", ???"TFRC","RPL8","NCOA4","LPCAT3","GLS2","DPP4","CS","CARS","ATP5G3","ALOX15","ACSL4","EMC2")
# 讀取樣本信息文件
rawAnno <- read.delim("merged_sample_quality_annotations.tsv",sep = "\t",row.names = NULL,check.names = F,stringsAsFactors = F,header = T) # 數(shù)據(jù)來(lái)自PanCanAtlas
#提取1-15個(gè)字符串
rawAnno$simple_barcode <- substr(rawAnno$aliquot_barcode,1,15)
#去除重復(fù)的樣本名
samAnno <- rawAnno[!duplicated(rawAnno$simple_barcode),c("cancer type", "simple_barcode")]
#去除cancer type 為空的數(shù)據(jù)
samAnno <- samAnno[which(samAnno$`cancer type` != ""),]
write.table(samAnno,"simple_sample_annotation.txt",sep = "\t",row.names = F,col.names = T,quote = F)
# 提取匹配到FRG基因的啟動(dòng)子探針
promoter <- probe.features[which(probe.features$feature %in% c("TSS1500","TSS200")),] # 根據(jù)注釋文件篩選啟動(dòng)子探針
# 取出感興趣基因在感興趣位點(diǎn)的探針
promoter <- promoter[which(promoter$gene %in% frg),]
write.table(promoter, "promoter_annotation_for_interested_genes.txt",sep = "\t",row.names = T,col.names = NA,quote = F)
# 獲取只有感興趣探針/基因的甲基化子集
for (i in tumors) { # 比較慢請(qǐng)耐心
??message("--",i,"...")
??# 加載甲基化RData文件,保存該文件夾的路徑
load(file.path("/media/desk16/wangd/pp85/tcga_methy450",paste0("TCGA-",i,"_methy450.Rdata")))
??#將數(shù)據(jù)類型轉(zhuǎn)換為數(shù)據(jù)框
??methy450 <- as.data.frame(methy450)
??#將第一列設(shè)為行名
??rownames(methy450) <- methy450[,1]
??# 去掉第一列探針名
??methy450 <- methy450[,-1]
??# 獲取行名和列名
??dimname <- dimnames(methy450)
??## 將數(shù)據(jù)全部轉(zhuǎn)為數(shù)值以防報(bào)錯(cuò)
??methy450 <- sapply(methy450, as.numeric)
??# 重新賦值行名和列名
??dimnames(methy450) <- dimname
??#將數(shù)據(jù)類型轉(zhuǎn)換為數(shù)據(jù)框
??methy450 <- as.data.frame(methy450)
??# 獲取甲基化數(shù)據(jù)和感興趣探針的交集
??compb <- intersect(rownames(methy450),rownames(promoter))
??methy450 <- methy450[compb,] # 取出子集
???# 將探針名和基因名進(jìn)行匹配
??methy450$gene <- promoter[compb,"gene"]
??# 如果基因匹配多個(gè)探針,則取中位數(shù)beta值
??methy450 <- apply(methy450[,setdiff(colnames(methy450), "gene")], 2, function(x) tapply(x, INDEX=factor(methy450$gene), FUN=median, na.rm=TRUE))
??methy450 <- as.data.frame(methy450)
??write.table(methy450, paste0("TCGA_",i,"_methy450_subset.txt"),sep = "\t",row.names = T,col.names = NA,quote = F)
}
deltaMeth <- NULL
for (i in tumors) {
??message("--",i,"...")
??# 獲取只包含感興趣探針/基因的甲基化數(shù)據(jù)
??meth_subset <- read.table(paste0("TCGA_",i,"_methy450_subset.txt"),sep = "\t",row.names = 1,check.names = F,stringsAsFactors = F,header = T)
??#提取樣本名1-15個(gè)字符串
??colnames(meth_subset) <- substr(colnames(meth_subset),1,15)
??#根據(jù)TCGA命名規(guī)律提取腫瘤和正常組織樣本
??tumsam <- colnames(meth_subset)[substr(colnames(meth_subset),14,14) == "0"] # 獲取腫瘤樣本
??norsam <- colnames(meth_subset)[substr(colnames(meth_subset),14,14) == "1"] # 獲取正常樣本
??outTab <- NULL
??for (j in rownames(meth_subset)) {
????if(i == "KICH") { # KICH沒(méi)有正常的甲基化樣本
??????delta <- 0 # 如果在分析KICH則delta設(shè)置為0
??????wt <- 1 # 如果在分析KICH則設(shè)置pvalue為1
??????outTab <- rbind.data.frame(outTab,
?????????????????????????????????data.frame(gene = j,
????????????????????????????????????????????tumor = i,
????????????????????????????????????????????Delta = delta,
????????????????????????????????????????????Pvalue = wt,
????????????????????????????????????????????stringsAsFactors = F),
?????????????????????????????????stringsAsFactors = F)
????} else {
??????tmp1 <- as.numeric(meth_subset[j,tumsam]) # 獲取腫瘤樣本的beta值
??????tmp2 <- as.numeric(meth_subset[j,norsam]) # 獲取正常樣本的beta值
??????# wilcox檢驗(yàn)
??????wt <- wilcox.test(tmp1,tmp2)$p.value
??????avgt <- mean(tmp1) # 腫瘤樣本的平均beta值
??????avgn <- mean(tmp2) # 正常樣本的平均beta值
??????delta <- avgt - avgn # delta值由腫瘤樣本減去正常樣本計(jì)算得到
??????#合并所有腫瘤組織的甲基化分析結(jié)果文件
??????outTab <- rbind.data.frame(outTab,
?????????????????????????????????data.frame(gene = j,
????????????????????????????????????????????tumor = i,
????????????????????????????????????????????Delta = delta,
????????????????????????????????????????????Pvalue = wt,
????????????????????????????????????????????stringsAsFactors = F),
?????????????????????????????????stringsAsFactors = F)
????}
??}
??deltaMeth <- rbind.data.frame(deltaMeth,
????????????????????????????????outTab,
????????????????????????????????stringsAsFactors = F)
}
write.table(deltaMeth, "TCGA_pancan_delta_meth_subset.txt",sep = "\t",row.names = F,col.names = T,quote = F)
5.開(kāi)始繪圖
# 設(shè)置顏色
blue <- "#4577FF"
red <- "#C2151A"
orange <- "#E45737"
green <- "#6F8B35"
darkblue <- "#303B7F"
darkred <- "#D51113"
yellow <- "#EECA1F"
# 產(chǎn)生泡泡圖
#確定基因的繪圖順序
deltaMeth$gene<-factor(deltaMeth$gene, ??????????????????levels=rev(c("CDKN1A","HSPA5","TTC35","SLC7A11","NFE2L2","MT1G","HSPB1","GPX4","FANCD2","CISD1", ??????????????????????????????????????"FDFT1","SLC1A5","SAT1","TFRC","RPL8","NCOA4","LPCAT3","GLS2","DPP4","CS","CARS","ATP5G3","ALOX15","ACSL4")))
#設(shè)置漸變色
my_palette <- colorRampPalette(c(blue,"white",orange), alpha=TRUE)(n=128)
#ggplot開(kāi)始繪圖
ggplot(deltaMeth, aes(x=tumor,y=gene)) +
??geom_point(aes(size=-log10(Pvalue),color=Delta)) +
??scale_color_gradientn('Delta(T-N)',
????????????????????????colors=my_palette) +
??#修改主題
??theme_bw() +
??theme(#panel.grid.minor = element_blank(),
????????#panel.grid.major = element_blank(),
????????axis.text.x = element_text(angle = 45, size = 12, hjust = 0.3, vjust = 0.5, color = "black"),
????????axis.text.y = element_text(size = 12, color = rep(c(red,blue),c(14,10))),
????????axis.title = element_blank(),
????????panel.border = element_rect(size = 0.7, linetype = "solid", colour = "black"),
????????legend.position = "right",
????????plot.margin = unit(c(1,1,1,1), "lines"))
#保存圖片
ggsave("tumor_and_normal_promoter_methylation_of_interested_genes_in_pancancer.pdf", width = 8,height = 6)
6.結(jié)果文件
(1)、?promoter_annotation_for_interested_genes.txt
該文件為提取感興趣基因集的啟動(dòng)子探針信息文件(TSS1500,TS200)。

(2)、TCGA_BLCA_methy450_subset.txt
該文件為特定腫瘤組織中感興趣基因集的甲基化數(shù)據(jù)文件。

(3)、.TCGA_pancan_delta_meth_subset.txt
該文件為計(jì)算的感興趣基因集在泛癌中的甲基化delta值和Pvalue值文件

(4)、.simple_sample_annotation.txt
該文件為簡(jiǎn)化的泛癌樣本信息文件。

(5)、.tumor_and_normal_promoter_methylation_of_interested_genes_in_pancancer.pdf
該圖為感興趣基因集在泛癌中的甲基化狀態(tài)點(diǎn)圖,橫坐標(biāo)為腫瘤組織名稱,縱坐標(biāo)為感興趣基因名,點(diǎn)的顏色來(lái)表示甲基化Delta值(腫瘤組織-正常組織),點(diǎn)的大小表示Pvalue值。

最終,小云一文完成了泛癌中腫瘤和正常組織中甲基化分析,泛癌分析的其他相關(guān)內(nèi)容可以嘗試使用本公司新開(kāi)發(fā)的云平臺(tái)生物信息分析小工具,零代碼完成分析,云平臺(tái)網(wǎng)址:http://www.biocloudservice.com/home.html,主要包括泛癌中特定通路基因的突變分析(http://www.biocloudservice.com/740/740.php),泛癌中拷貝數(shù)變異分析(http://www.biocloudservice.com/700/700.php),基因在腫瘤與正常組織中差異分析(http://www.biocloudservice.com/710/710.php)等相關(guān)泛癌分析小工具;該分析在服務(wù)器運(yùn)行速度會(huì)很快(ps:甲基化文件比較大),小云為大家?guī)?lái)一個(gè)福利奧,購(gòu)買代碼可以免費(fèi)試用本公司服務(wù)器三天,服務(wù)器租賃用戶所有付費(fèi)代碼可以折扣獲取,有需要的可以聯(lián)系小云哈!

