源碼閱讀網(wǎng)elAsticsearch核心技術(shù)
比對(duì)到參考基因組,生成SAM文件
bwa index -a bwtsw hg19.fa ?#*.fa >2G,用-a bwtsw;*.fa < 2G,用-a is (默認(rèn))time bwa mem -M -t 10 -R '@RG\tID:HKV2KALXX.7\tSM:sample1\tPL:illumina\tLB:sample1'
$DB/hg19.fa sample1_1.fq sample1_2.fq >aligned_sample1.sam#-M :Mark shorter split hits as secondary (essential for Picard compatibility)#-t: number of threads#-R:定義頭文件,如果在此步驟不進(jìn)行頭文件定義,在后續(xù)GATK分析中還是需要重新增加頭文件。具體信息可從樣本的fq文件中獲得。#@RG: Read Group,必須要有,否則GATK無法進(jìn)行calling;比對(duì)速度比不加@RG更快 ?##ID:Read group identifier, flowcell + lane name and number in Illunima data ? >>>> ID:FLOWCELL1.LANE1(每個(gè)flowcell的每個(gè)lane是unique的), EX. HKV2KALXX.7#PL: platform>>> ILLUMINA#SM: Sample >>>sample1#LB: DNA preparation library identifier. MarkDuplicates uses the LB
標(biāo)簽: