探索細胞的奇妙之旅:splatter帶你輕松玩轉(zhuǎn)單細胞測序數(shù)據(jù)!(二)
在上一篇教程中,小云教大家如何使用splatter和scater包進行單細胞測序數(shù)據(jù)的模擬和分析,其中包括模擬數(shù)據(jù)的快速生成、參數(shù)估計、自定義參數(shù)創(chuàng)建、數(shù)據(jù)提取、數(shù)據(jù)標準化和降維處理等,那么今天來繼續(xù)帶大家學習這個R包的使用方法,今天的學習也將基于昨天的代碼哦,想要深入學習的小伙伴一起看下去吧! 首先,我們來回顧一下上一篇中使用splatter包的代碼: library(splatter) library(scater) set.seed(1) sce <- mockSCE() # class: SingleCellExperiment counts(sce) params <- splatEstimate(sce)?# Params object params <- newSplatParams() #默認10000 Genes,100 Cells getParam(params, "nGenes") getParam(params, "nCells") getParams(params, c("nGenes", "mean.rate", "mean.shape")) params <- setParam(params, "nGenes", 5000) params <- setParam(params, "batchCells", 50) params <- setParams(params, mean.shape = 0.5, de.prob = 0.2) params <- setParams(params, update = list(nGenes = 8000, mean.rate = 0.5)) sim <- splatSimulate(params) class(sim) head(rowData(sim)) head(colData(sim)) names(assays(sim)) assays(sim)$CellMeans[1:5, 1:5] counts <- counts(sim) counts[1:3,1:5] class(counts) # "matrix" "array" typeof(counts)?# [1] "integer" dim(counts) #[1] 8000?100 sim <- logNormCounts(sim) counts(sim)[1:3,1:5] logcounts(sim)[1:3,1:5] sim <- runPCA(sim) plotPCA(sim) sim.groups <- splatSimulate(group.prob = c(0.3, 0.7), method = "groups", ???????????????verbose = FALSE) sim.groups <- logNormCounts(sim.groups) sim.groups <- runPCA(sim.groups) plotPCA(sim.groups, colour_by = "Group") ? dim(counts(sim.groups)) rowData(sim.groups)$DEFacGroup1 rowData(sim.groups)$DEFacGroup2 metadata(sim.groups) sim.paths <- splatSimulate(de.prob = 0.2, nGenes = 1000, method = "paths", ??????????????verbose = FALSE) sim.paths <- logNormCounts(sim.paths) sim.paths <- runPCA(sim.paths) plotPCA(sim.paths, colour_by = "Step") colData(sim.paths)$Step sim.batches <- splatSimulate(batchCells = c(50, 50), verbose = FALSE) sim.batches <- logNormCounts(sim.batches) sim.batches <- runPCA(sim.batches) plotPCA(sim.batches, colour_by = "Batch") rowData(sim.batches)$BatchFacBatch1 rowData(sim.batches)$BatchFacBatch2 dev.off() 繼續(xù)上面的操作,我們來繼續(xù)學習這個包的應(yīng)用:
比較SingleCellExperiment對象
首先我們使用splatSimulate函數(shù)生成兩個SingleCellExperiment對象sim1和sim2,分別表示兩個不同的實驗條件。然后使用compareSCEs函數(shù)將這兩個對象進行比較,并將結(jié)果保存在comparison變量中。 ### 7.比較SingleCellExperiment對象 ## 相互比較 sim1 <- splatSimulate(nGenes = 1000, batchCells = 20, verbose = FALSE) sim2 <- simpleSimulate(nGenes = 1000, nCells = 20, verbose = FALSE) comparison <- compareSCEs(list(Splat = sim1, Simple = sim2)) 接著,我們來學習如何訪問comparison對象中的不同屬性,包括RowData和ColData,以及Plots中的Means、LibrarySizes和ZerosCell。 names(comparison) comparison$RowData comparison$ColData ? names(comparison$Plots) comparison$Plots$Means comparison$Plots$LibrarySizes comparison$Plots$ZerosCell
之后,代碼再次生成三個SingleCellExperiment對象,分別命名為sim1、sim2和sim3。然后使用diffSCEs函數(shù)將這三個對象與參考對象"Simple"進行比較,并將結(jié)果保存在difference變量中。 ## 與參考進行比較 sim1 <- splatSimulate(nGenes = 1000, batchCells = 100, verbose = FALSE) sim2 <- splatSimulate(nGenes = 1000, batchCells = c(40, 60), verbose = FALSE) sim3 <- simpleSimulate(nGenes = 1000, nCells = 100, verbose = FALSE) difference <- diffSCEs(list(Splat1 = sim1, Splat2 = sim2, Simple = sim3), ref = "Simple") difference$Plots$Means difference$QQPlots$Means
批次效應(yīng)參數(shù)設(shè)置
接下來,我們學習如何使用splatSimulate函數(shù)生成帶有批次效應(yīng)的數(shù)據(jù)。以下代碼分別生成了兩個SingleCellExperiment對象sim1和sim2,使用不同的批次效應(yīng)參數(shù)。然后對這兩個對象進行了log2標準化和PCA分析,并使用plotPCA函數(shù)將結(jié)果可視化。 library("splatter") library("scater") library("ggplot2") ? # Simulation with small batch effects sim1 <- splatSimulate(params, batchCells = c(100, 100), ????????????batch.facLoc = 0.001, batch.facScale = 0.001, ????????????verbose = FALSE) sim1 <- logNormCounts(sim1) sim1 <- runPCA(sim1) plotPCA(sim1, colour_by = "Batch") + ggtitle("Small batch effects") ? # Simulation with big batch effects sim2 <- splatSimulate(params, batchCells = c(100, 100), ????????????batch.facLoc = 0.5, batch.facScale = 0.5, ????????????verbose = FALSE) sim2 <- logNormCounts(sim2) sim2 <- runPCA(sim2) plotPCA(sim2, colour_by = "Batch") + ggtitle("Big batch effects") 我們分別來看下兩次可視化后的結(jié)果吧!
帶有批次效應(yīng)的數(shù)據(jù)sim1,然后不移除批次效應(yīng)進行l(wèi)og2標準化和PCA分析;接著生成一個沒有批次效應(yīng)的數(shù)據(jù)sim2,然后對其進行l(wèi)og2標準化和PCA分析。最后,使用plotPCA函數(shù)將結(jié)果可視化并進行對比。 # 是否消除批次效應(yīng)設(shè)置 sim1 <- splatSimulate(params, batchCells = c(100, 100), batch.rmEffect = FALSE, ????????????verbose = FALSE) sim1 <- logNormCounts(sim1) sim1 <- runPCA(sim1) plotPCA(sim1, colour_by = "Batch") + ggtitle("With batch effects") ? sim2 <- splatSimulate(params, batchCells = c(100, 100), batch.rmEffect = TRUE, ????????????verbose = FALSE) sim2 <- logNormCounts(sim2) sim2 <- runPCA(sim2) plotPCA(sim2, colour_by = "Batch") + ggtitle("Batch effects removed") 可視化后的結(jié)果如下:
離群值參數(shù)設(shè)置
接下來,展示了如何設(shè)置離群值參數(shù),并使用splatSimulate函數(shù)生成帶有離群值的數(shù)據(jù)。以下代碼分別生成了兩個SingleCellExperiment對象sim1和sim2,使用不同的離群值概率參數(shù)。 ### 9. 離群值參數(shù)設(shè)置 ## ----outlier-prob------------------------------------------------------------- # Few outliers sim1 <- splatSimulate(out.prob = 0.001, verbose = FALSE) ggplot(as.data.frame(rowData(sim1)), ????aes(x = log10(GeneMean), fill = OutlierFactor != 1)) + ??geom_histogram(bins = 100) + ??ggtitle("Few outliers") ? # Lots of outliers sim2 <- splatSimulate(out.prob = 0.2, verbose = FALSE) ggplot(as.data.frame(rowData(sim2)), ????aes(x = log10(GeneMean), fill = OutlierFactor != 1)) + ??geom_histogram(bins = 100) + ??ggtitle("Lots of outliers") 最后繪圖結(jié)果讓我們來一起看一下:
設(shè)置組參數(shù)
接下來,教大家如何設(shè)置組參數(shù),并使用splatSimulateGroups函數(shù)生成具有不同組的數(shù)據(jù)。首先我們惡意生成一個包含一個小組和一個大組的數(shù)據(jù)sim1,然后對其進行l(wèi)og2標準化和PCA分析;接著生成一個包含五個組的數(shù)據(jù)sim2,然后對其進行l(wèi)og2標準化和PCA分析。最后,使用plotPCA函數(shù)將結(jié)果可視化并進行對比。 ### 10.設(shè)置組參數(shù) # One small group, one big group params.groups <- newSplatParams(batchCells = 500, nGenes = 1000) sim1 <- splatSimulateGroups(params.groups, group.prob = c(0.9, 0.1), ???????????????verbose = FALSE) sim1 <- logNormCounts(sim1) sim1 <- runPCA(sim1) plotPCA(sim1, colour_by = "Group") + ggtitle("One small group, one big group") ? # Five groups sim2 <- splatSimulateGroups(params.groups, ???????????????group.prob = c(0.2, 0.2, 0.2, 0.2, 0.2), ???????????????verbose = FALSE) sim2 <- logNormCounts(sim2) sim2 <- runPCA(sim2) plotPCA(sim2, colour_by = "Group") + ggtitle("Five groups") 可視化后的結(jié)果如下:
設(shè)置差異表達基因參數(shù)
最后,小云將展示了如何設(shè)置差異表達基因參數(shù),并使用splatSimulateGroups函數(shù)生成具有不同差異表達基因的數(shù)據(jù)。 ### 11. 設(shè)置差異表達基因參數(shù) ## de.prob 參數(shù) # Few DE genes params.groups <- newSplatParams(batchCells = 500, nGenes = 1000) sim1 <- splatSimulateGroups(params.groups, group.prob = c(0.5, 0.5), ???????????????de.prob = 0.01, verbose = FALSE) sim1 <- logNormCounts(sim1) sim1 <- runPCA(sim1) plotPCA(sim1, colour_by = "Group") + ggtitle("Few DE genes") ? # Lots of DE genes sim2 <- splatSimulateGroups(params.groups, group.prob = c(0.5, 0.5), ???????????????de.prob = 0.3, verbose = FALSE) sim2 <- logNormCounts(sim2) sim2 <- runPCA(sim2) plotPCA(sim2, colour_by = "Group") + ggtitle("Lots of DE genes") 可視化后的結(jié)果如下:
如何,這幾個splatter的相關(guān)應(yīng)用你學會了嘛?是不是很簡單!更多學習干貨要多多關(guān)注小云哦??!