二代測(cè)序數(shù)據(jù)根據(jù)barcode分組
當(dāng)我們拿到的二代測(cè)序數(shù)據(jù)的數(shù)據(jù)量非常龐大時(shí),我們?cè)趺磳?duì)其進(jìn)行分組呢,今天小云來(lái)帶你探索其中的奧妙。
二代測(cè)序數(shù)據(jù)經(jīng)過(guò)fastp軟件篩選之后我們就得到了成千上萬(wàn)條如圖所示的序列,每組序列我們都在它的前端添加相同的barcode如圖中紅色標(biāo)記所示,這樣我們就可以用遍歷的方式將相同組的序列放在一起。

因?yàn)殡p端測(cè)序存在反向互補(bǔ)的問(wèn)題,所以首先我們先定義一個(gè)反向互補(bǔ)的函數(shù)。
#互補(bǔ)函數(shù)
def complement(sequence):
????sequence = sequence.upper()
????sequence = sequence.replace('A', 't')
????sequence = sequence.replace('T', 'a')
????sequence = sequence.replace('C', 'g')
????sequence = sequence.replace('G', 'c')
????return sequence.upper()
#反向函數(shù)
def reverse(sequence):
????sequence = sequence.upper()
????return sequence[::-1]
#反向互補(bǔ)函數(shù)
def complement_reverse(sequence):
return complement(reverse(sequence))
?
函數(shù)編寫(xiě)完成后,我們就可以將barcode放入函數(shù)中,就可以輸出反向互補(bǔ)的序列啦。
print(complement_reverse('CCATCAATGCC') ,complement_reverse('CGATGGCGATA'),complement_reverse('ACACAAGCACC'),complement_reverse('CCGTTTCGACG'),complement_reverse('GGAAGTAGACC'))
?
接下來(lái)就是分組根據(jù)barcode分組啦。
rf = open("./輸出文件.fq") #改成自己二代測(cè)序的fq文件
A1= open("./1_A1.txt",'w')#創(chuàng)建txt文件寫(xiě)入不同組數(shù)據(jù)
A2= open("./1_A2.txt",'w')
A3= open("./1_A3.txt",'w')
A4= open("./1_A4.txt",'w')
A5= open("./1_A5.txt",'w')
i=0
for line in rf: ?#遍歷二代測(cè)序數(shù)據(jù)
????line = line.strip()
????if ('CCATCAATGCC' in line):#若序列存在CCATCAATGCC,則寫(xiě)入A1文件中
????????A1.write(line)
????????A1.write('\n')
????elif ('GGCATTGATGG'in line):
????????A1.write(complement_reverse(line))
????????A1.write('\n')
?
????elif ('CGATGGCGATA' in ?line):
????????A2.write(line)
????????A2.write('\n')
????elif ('TATCGCCATCG' in line):
????????A2.write(complement_reverse(line))
????????A2.write('\n')
?
????elif ('ACACAAGCACC' in ?line):
????????A3.write(line)
????????A3.write('\n')
????elif ('GGTGCTTGTGT' in ?line):
????????A3.write(complement_reverse(line))
????????A3.write('\n')
?
????elif ('CCGTTTCGACG' in ?line):
????????A4.write(line)
????????A4.write('\n')
????elif ('CGTCGAAACGG' in ?line):
????????A4.write(complement_reverse(line))
????????A4.write('\n')
????????
????elif ('GGAAGTAGACC' in ?line):
????????A5.write(line)
????????A5.write('\n')
????elif ('GGTCTACTTCC' in ?line):
????????A5.write(complement_reverse(line))
????????A5.write('\n')
?
????i+=1
# ????if i >10000:
# ????????break
????if (i%10000000 ==0): #查看代碼進(jìn)度
????????print(i)
rf.close() #關(guān)閉文件
?
最后我們就得到不同的txt文件,分組也就完成啦。

好了,今天小云的分享就到這里啦,小伙伴們有什么問(wèn)題歡迎來(lái)討論分享呀。

