最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊(cè)

觸類旁通!一文掌握泛癌之感興趣基因集SNV分析

2023-06-15 10:18 作者:爾云間  | 我要投稿

Pancancer一詞,在生信文章中出現(xiàn)的頻率越來(lái)越高,已經(jīng)成為一個(gè)網(wǎng)紅分析內(nèi)容,趁著今天的機(jī)會(huì),今天小云想為大家分享一下泛癌分析之感興趣基因集SNV分析,接下來(lái)小云帶著大家開(kāi)始今天的學(xué)習(xí)之旅。


1.?如何進(jìn)行泛癌中感興趣基因集SNV分析?

在進(jìn)行分析之前,小云想為大家科普一下何為SNV,SNV為單核苷酸多態(tài)性,主要包括在coding區(qū)域和非coding區(qū)域發(fā)生突變,編碼區(qū)的SNV可能影響蛋白質(zhì)序列,而非編碼區(qū)的的SNV可能影響基因的表達(dá)和剪接過(guò)程,這就是小云對(duì)SNV的簡(jiǎn)單介紹哈!

接下來(lái)小云為大家介紹一哈咋進(jìn)行該分析,其實(shí)很簡(jiǎn)單,只需要下載pancer突變數(shù)據(jù),以及自己感興趣的基因集,就可以來(lái)統(tǒng)計(jì)感興趣基因集在不同腫瘤組織中發(fā)生突變的頻率,以及發(fā)生的突變類型,小云已經(jīng)迫不及待想開(kāi)始實(shí)操啦,那就開(kāi)始吧!


2.需要準(zhǔn)備的R包

#安裝需要的R包

install.packages("ggplot2")

install.packages("readxl")

install.packages("stringr")

install.packages("dplyr")

install.packages("tidyverse")

install.packages("reshape2")

install.packages("RcolorBrewer")

install.packages("magrittr")

BiocManager::install("maftools")

#導(dǎo)入需要的R包

library(magrittr)

library(readxl)

library(stringr)

library(ggplot2)

library(maftools)

library(dplyr)

library(reshape2)

library(tidyverse)

library(RColorBrewer)


3.?數(shù)據(jù)準(zhǔn)備

在進(jìn)行感興趣基因在泛癌中的突變情況時(shí),我們只需要兩個(gè)文件就可以順利的完成分析,包括TCGA pancancer突變數(shù)據(jù)和TCGA pancancer 樣本信息文件,小云為大家提供了數(shù)據(jù)下載網(wǎng)址:https://gdc.cancer.gov/about-data/publications/pancanatlas,下載起來(lái)so esay啦!,小云并為大家貼上該網(wǎng)址,方便大家迅速完成數(shù)據(jù)準(zhǔn)備工作。


#TCGA pancancer突變數(shù)據(jù)下載

mc3.v0.2.8.PUBLIC.maf.gz,泛癌突變數(shù)據(jù)文件,需要關(guān)注基因名,變異類型以及樣本名這三列信息。


#泛癌樣本信息文件下載

merged_sample_quality_annotations.tsv,只需要關(guān)注前三列數(shù)據(jù)就可以,樣本信息和腫瘤類型。


?4.?泛癌中感興趣基因集SNV分析

如果小伙伴已經(jīng)下載好了所需要的數(shù)據(jù),小云將通過(guò)樣本信息文件,泛癌突變數(shù)據(jù)文件和感興趣基因集,來(lái)提取不同腫瘤組織中目標(biāo)基因集發(fā)生的突變文件,統(tǒng)計(jì)目標(biāo)基因發(fā)生在不同腫瘤中的突變頻率和突變類型,通過(guò)該分析小伙伴會(huì)掌握很多數(shù)據(jù)處理的小技巧,小云強(qiáng)烈安利小伙伴認(rèn)真學(xué)習(xí),真的是干活滿滿。

# 修正TCGA名稱

#讀取樣本信息文件

rawAnno <- read.delim("merged_sample_quality_annotations.tsv",sep = "\t",row.names = NULL,check.names = F,stringsAsFactors = F,header = T)

#提取1-15位字符串

rawAnno$simple_barcode <- substr(rawAnno$aliquot_barcode,1,15)

#去除重復(fù)的樣本名

samAnno<- rawAnno[!duplicated(rawAnno$simple_barcode),c("cancer?type", "simple_barcode")]

#去除cancer type 為空的數(shù)據(jù)

samAnno <- samAnno[which(samAnno$`cancer type` != ""),]

#保存簡(jiǎn)化的樣本信息文件

write.table(samAnno,"simple_sample_annotation.txt",sep = "\t",row.names = F,col.names = T,quote = F)

# 加載泛癌突變MAF文件,讀取文件有點(diǎn)慢,耐心等待奧。

panmaf <- read_tsv("mc3.v0.2.8.PUBLIC.maf.gz", comment = "#")

# 感興趣基因列表

genelist<- c("PER3","PER2","GPR50","CYP2C19","MTNR1B","CLOCK","CYP1A2","RORA","ARNTL","MTNR1A","ASMT","AANAT")

#codeing區(qū)域

vc_nonSyn <- c("Frame_Shift_Del",

???????????????"Frame_Shift_Ins",

???????????????"Splice_Site",

???????????????"Translation_Start_Site",

???????????????"Nonsense_Mutation",

???????????????"Nonstop_Mutation",

???????????????"In_Frame_Del",

???????????????"In_Frame_Ins",

???????????????"Missense_Mutation")

## 非coding區(qū)域

vc_Syn <- c("3'UTR",

????????????"5'UTR",

????????????"3'Flank",

????????????"5'Flank",

????????????"IGR",

????????????"Intron",

????????????"Silent")

# 獲取不同癌種的樣本數(shù)

mafsam <- data.frame(samID = panmaf$Tumor_Sample_Barcode,

?????????????????????simple_barcode = substr(panmaf$Tumor_Sample_Barcode,1,15),

?????????????????????stringsAsFactors = F)

#去除重復(fù)的樣本

mafsam <- mafsam[!duplicated(mafsam$simple_barcode),]

#合并兩個(gè)文件

mafsam <- merge(mafsam,samAnno, by = "simple_barcode", all.x = TRUE)

mafsam <- as.data.frame(na.omit(mafsam))

?

txt <- data.frame(label = paste0(names(table(mafsam$`cancer type`))," (n=", as.numeric(table(mafsam$`cancer type`)),")"),

??????????????????"cancer type" = names(table(mafsam$`cancer type`)),

??????????????????check.names = F) # 將腫瘤和樣本數(shù)標(biāo)簽合并

#合并兩個(gè)文件

mafsam <- merge(mafsam, txt, by = "cancer type", all.x = TRUE)

rownames(mafsam) <- mafsam$simple_barcode

# 取感興趣的突變子集

panmaf <- panmaf[which(panmaf$Hugo_Symbol %in% genelist),]

panmaf$Tumor_Sample_Barcode <- substr(panmaf$Tumor_Sample_Barcode,1,15)

write.table(panmaf, file = "maf_of_genes_of_interest.txt",sep = "\t",row.names = F,col.names = T,quote = F)

# 創(chuàng)建突變矩陣

panmaf <- read_tsv("maf_of_genes_of_interest.txt", comment = "#")

nonSyncount <- matrix(0, nrow = length(genelist), ncol = length(mafsam$simple_barcode),

???????????????????dimnames = list(genelist, mafsam$simple_barcode))

Syncount <- matrix(0, nrow = length(genelist), ncol = length(mafsam$simple_barcode),

???????????????????dimnames = list(genelist, mafsam$simple_barcode))

nonSyncount <- as.data.frame(t(nonSyncount))

Syncount <- as.data.frame(t(Syncount))

#通過(guò)循環(huán)計(jì)算感興趣基因在每種腫瘤中出現(xiàn)的coding和非coding的數(shù)目

for (i in genelist) {

??message(i," starts...")

??submaf <- panmaf[which(panmaf$Hugo_Symbol == i),] # 取出當(dāng)前基因的突變

??for (j in 1:nrow(submaf)) { # 循環(huán)每一行

????tmp <- submaf[j,]

????mutGene <- i # 獲取當(dāng)前突變的基因

????mutSam <- tmp$Tumor_Sample_Barcode # 獲取當(dāng)前突變的樣本

????if(is.element(tmp$Variant_Classification, vc_nonSyn)) { # 如果突變類型為coding類

??????nonSyncount[mutSam, mutGene] <- 1 # 則coding計(jì)數(shù)矩陣設(shè)為1

????}

????if(is.element(tmp$Variant_Classification, vc_Syn)) { # 如果突變類型為非coding類

??????Syncount[mutSam, mutGene] <- 1 # 則非coding計(jì)數(shù)矩陣設(shè)為1

????}

??}

}

# 根據(jù)癌種計(jì)算累積突變樣本量

nonSyncount$tumor <- mafsam[rownames(nonSyncount),"label"]

Syncount$tumor <- mafsam[rownames(Syncount),"label"]

nonSyncount.tumor <- as.data.frame(apply(nonSyncount[,setdiff(colnames(nonSyncount), "tumor")], 2, function(x) tapply(x, INDEX=factor(nonSyncount$tumor), FUN=sum, na.rm=TRUE))) # 對(duì)同一種癌種的非沉默突變求和

Syncount.tumor <- as.data.frame(apply(Syncount[,setdiff(colnames(Syncount), "tumor")], 2, function(x) tapply(x, INDEX=factor(Syncount$tumor), FUN=sum, na.rm=TRUE))) # 對(duì)同一種癌種的沉默突變求和

mutCount <- matrix(NA, nrow = length(genelist), ncol = length(unique(mafsam$label)),

???????????????????dimnames = list(genelist, unique(mafsam$label)))

mutCount <- as.data.frame(mutCount)

mutFreq <- mutCount

#通過(guò)循環(huán)計(jì)算感興趣基因在每種腫瘤中出現(xiàn)的coding和非coding的突變頻率。

for (i in genelist) {

??for(j in unique(mafsam$label)) {

????tumor <- sapply(strsplit(j," (", fixed = T),"[",1)

????if(nonSyncount.tumor[j,i] > 0) { # 若非沉默突變存在

??????mutCount[i,j] <- nonSyncount.tumor[j,i] # 記錄該癌種的非沉默突變數(shù)

??????mutFreq[i,j] <- nonSyncount.tumor[j,i]/as.numeric(table(mafsam$`cancer type`)[tumor]) * 100 # 根據(jù)該癌種的樣本數(shù)計(jì)算突變頻率(個(gè)人認(rèn)為原文并沒(méi)有如此計(jì)算,原文的顏色映射似乎是按照突變數(shù)目來(lái)的,而不是突變頻率)

????}

????if(nonSyncount.tumor[j,i] == 0 & Syncount.tumor[j,i] > 0) { # 若非沉默突變不存在,而沉默突變存在

??????mutCount[i,j] <- 0 # 則記錄該癌種的突變數(shù)目為0

??????mutFreq[i,j] <- 0 # 突變頻率為0

????}

????if(nonSyncount.tumor[j,i] == 0 & Syncount.tumor[j,i] == 0) { # 若非沉默突變不存在,沉默突變也不存在

??????mutCount[i,j] <- NA # 則此單元為空值

??????mutFreq[i,j] <- 0 # 突變頻率為0

????}

??}

}


5.繪制熱圖和瀑布圖

#長(zhǎng)寬數(shù)據(jù)轉(zhuǎn)換

ggCount <- melt(as.matrix(mutCount), varnames = c("Cancer", "Gene"), as.is = T)

ggFreq <- melt(as.matrix(mutFreq), varnames = c("Cancer", "Gene"), as.is = T)

#合并兩個(gè)文件

ggData <- cbind.data.frame(ggCount, Freq = ggFreq$value)

#設(shè)置列名

colnames(ggData) <- c("Cancer","Gene","Count","Freq")

#開(kāi)始繪制熱圖

p <-

??ggplot(data = ggData) +

??geom_tile(mapping = aes(Gene, Cancer, fill = Freq), col = "grey80") +

??geom_text(mapping = aes(Gene, Cancer, label = Count), size =3) +

??scale_fill_gradient(low = "white","high" = "red") +

??scale_x_discrete(position = "top") +

??labs(fill = "Mutation Frequency (%)") +

??theme_bw() + #theme(axis.text.x = element_text(angle = 45))

??theme(axis.title = element_blank(),

????????axis.text.x = element_text(hjust = 0, size = 10, color = "black", angle = 45),

????????axis.text.y = element_text(hjust = 1, size = 10, color = "black"),

????????axis.ticks = element_line(size=0.2, color="black"),

????????axis.ticks.length = unit(0.2, "cm"),

????????panel.background = element_blank(),

????????panel.grid = element_blank())

ggsave(p, filename = "frequency_heatmap.pdf", width = 10,height = 4)

#繪制瀑布圖

# 讀取感興趣的基因子集突變文件

panmaf <- read.maf(maf = "maf of genes of interest.txt",

???????????????????removeDuplicatedVariants = TRUE,

???????????????????isTCGA = FALSE,

???????????????????vc_nonSyn = c("Frame_Shift_Del",

?????????????????????????????????"Frame_Shift_Ins",

?????????????????????????????????"Splice_Site",

?????????????????????????????????"Translation_Start_Site",

?????????????????????????????????"Nonsense_Mutation",

?????????????????????????????????"Nonstop_Mutation",

?????????????????????????????????"In_Frame_Del",

?????????????????????????????????"In_Frame_Ins",

?????????????????????????????????"Missense_Mutation"))

mafAnno<-mafsam[which(mafsam$simple_barcode%in%panmaf@data$Tumor_Sample_Barcode),]

colnames(mafAnno)[1:2] <- c("Cancer","Tumor_Sample_Barcode")

# 設(shè)置突變顏色

mutcol <- brewer.pal(n = 10, name = 'Paired')

names(mutcol)<-c('Frame_Shift_Del','Missense_Mutation','Nonsense_Mutation', 'Frame_Shift_Ins','In_Frame_Ins','Splice_Site','In_Frame_Del','Nonstop_Mutation','Translation_Start_Site','Multi_Hit')

cancercolors<-NMF:::ccRamp(brewer.pal(n=12,name='Paired'),length(unique(mafAnno$Cancer)))

names(cancercolors) <- unique(mafAnno$Cancer)

#繪制瀑布圖

annocolors = list(Cancer = cancercolors)

pdf(file = "oncoprint.pdf", width = 10, height = 5)

oncoplot(maf = panmaf, # MAF文件

?????????colors = mutcol, # 突變類型的顏色

?????????minMut = 0.05 * nrow(mafAnno), # 所顯示的突變至少有5%的突變頻率

?????????bgCol = "grey95", # 瀑布圖背景色

?????????borderCol = NA, # 突變的邊框(由于樣本太多所以不設(shè)置邊框)

?????????annotationDat = mafAnno, # 樣本注釋文件

?????????annotationColor = annocolors,

?????????clinicalFeatures = "Cancer", # 從注釋文件中顯示的信息

?????????sortByAnnotation = T) # 按照mafAnno的第一列排序,即按照Cancer排序

invisible(dev.off())


6.?結(jié)果文件

1.?maf_of_genes_of_interest.txt

該文件為提取的感興趣基因集在泛癌中的突變數(shù)據(jù);重點(diǎn)關(guān)注基因名,變異類型和樣本信息這三列信息。


2.?simple_sample_annotation.txt

該文件為簡(jiǎn)化后的泛癌樣本信息文件,第一列樣本名,第二列腫瘤類型。


3.?frequency_heatmap.pdf

該圖為目標(biāo)基因集在不同腫瘤組織中突變頻率熱圖,橫坐標(biāo)為腫瘤類型以及每種腫瘤所包含的樣本數(shù)目,縱坐標(biāo)為目標(biāo)基因,通過(guò)顏色深淺來(lái)表示突變頻率高低。


4.?frequency_heatmap.pdf

該圖為繪制的感興趣基因在不同腫瘤組織中突變情況的瀑布圖,利用不同的顏色分別來(lái)表示不同腫瘤類型和發(fā)生的變異類型。


小云最終繪制了反映感興趣基因集突變情況的熱圖和瀑布圖,只需要輸入自己感興趣的基因集文件,就可以輕松的完成該分析,泛癌分析的其他分析內(nèi)容可以嘗試使用本公司新開(kāi)發(fā)的云平臺(tái)生物信息分析小工具,零代碼完成分析,云平臺(tái)網(wǎng)址:http://www.biocloudservice.com/home.html;主要包括泛癌中體細(xì)胞拷貝數(shù)變異分析(http://www.biocloudservice.com/704/704.php),基因在腫瘤與正常組織中差異分析(http://www.biocloudservice.com/710/710.php)等相關(guān)泛癌分析小工具,歡迎大家和小云一起討論學(xué)習(xí)奧,下期再見(jiàn)奧。


好了,今天小云的分享就到這里啦,完整版代碼上“云生信學(xué)生物信息學(xué)”公正號(hào)回復(fù)“代碼”領(lǐng)取喲~


觸類旁通!一文掌握泛癌之感興趣基因集SNV分析的評(píng)論 (共 條)

分享到微博請(qǐng)遵守國(guó)家法律
龙山县| 邳州市| 闸北区| 札达县| 娄底市| 韩城市| 鸡泽县| 平安县| 唐山市| 阿城市| 科技| 翁牛特旗| 闵行区| 高雄县| 新晃| 富蕴县| 隆尧县| 榆树市| 南丰县| 平度市| 高安市| 莆田市| 军事| 龙山县| 元朗区| 贵南县| 博客| 湘阴县| 霍林郭勒市| 湄潭县| 洪泽县| 德惠市| 察雅县| 自贡市| 霸州市| 嘉义县| 井研县| 大丰市| 积石山| 钟山县| 高阳县|