【ROSALIND】【練Python,學(xué)生信】31 轉(zhuǎn)換與顛換

如果第一次閱讀本系列文檔請先移步閱讀【ROSALIND】【練Python,學(xué)生信】00 寫在前面 ?謝謝配合~

題目:
轉(zhuǎn)換與顛換(Transitions and Transversions)
Given: Two DNA strings s1 and s2 of equal length (at most 1 kbp).
所給:兩條不超過1kb長的DNA序列s1和s2。
Return: The transition/transversion ratio R(s1,s2).
需得:轉(zhuǎn)換與顛換頻率的比值R(s1,s2)。
?
測試數(shù)據(jù)
>Rosalind_0209
GCAACGCACAACGAAAACCCTTAGGGACTGGATTATTTCGTGATCGTTGTAGTTATTGGA
AGTACGGGCATCAACCCAGTT
>Rosalind_2200
TTATCTGACAAAGAAAGCCGTCAACGGCTGGATAATTTCGCGATCGTGCTGGTTACTGGC
GGTACGAGTGTTCCTTTGGGT
測試輸出
1.21428571429
?
生物學(xué)背景
????????點(diǎn)突變包括兩種類型:轉(zhuǎn)換(transition)和顛換(transversion)。轉(zhuǎn)換是嘌呤與嘌呤,或嘧啶與嘧啶之間的替換,即A與G,T與C之間的替換;顛換則是嘌呤與嘧啶之間的替換。簡單來說,轉(zhuǎn)換不改變堿基的種類,顛換會(huì)改變。如下圖:

????????因?yàn)轭崜Q的改變更為劇烈,所以發(fā)生的頻率更低。在基因組中,轉(zhuǎn)換與顛換頻率的比值約為2。在蛋白編碼區(qū),這個(gè)比值可以超過3,因?yàn)橄鄬τ陬崜Q,轉(zhuǎn)換不容易改變密碼子編碼的氨基酸。也因?yàn)檫@個(gè)原因,轉(zhuǎn)換與顛換頻率的比值可以幫我們鑒定蛋白編碼區(qū)。
?
數(shù)學(xué)背景
????????將序列發(fā)生轉(zhuǎn)換與顛換的次數(shù)相比則得到所求比值。
?
思路
????????本題思路很簡單,只需比較兩序列,記錄轉(zhuǎn)換和顛換的次數(shù),相比即可。
?
代碼
def readfasta(lines):
??? """讀入fasta格式文件的函數(shù)"""
??? seq = []
??? index = []
??? seqplast = ""
??? numlines = 0
??? for i in lines:
??????? if '>' in i:
??????????? index.append(i.replace("\n", "").replace(">", ""))
??????????? seq.append(seqplast.replace("\n", ""))
??????????? seqplast = ""
??????????? numlines += 1
??????? else:
??????????? seqplast = seqplast + i.replace("\n", "")
??????????? numlines += 1
??????? if numlines == len(lines):
??????????? seq.append(seqplast.replace("\n", ""))
??? seq = seq[1:]
??? return index, seq
?
?
f = open('rosalind_tran.txt', 'r')
lines = f.readlines()
f.close()
[index, seq] = readfasta(lines)
s1 = seq[0]
s2 = seq[1]
i = 0
ti = 0 # 記錄轉(zhuǎn)換
tv = 0 # 記錄顛換
while i < len(s1):
??? if (s1[i] == 'A' and s2[i] == 'G') or (s1[i] == 'G' and s2[i] == 'A') or (s1[i] == 'C' and s2[i] == 'T') or (s1[i] == 'T' and s2[i] == 'C'):
??????? ti = ti + 1
??? elif ((s1[i] == 'A' and s2[i] == 'T') or (s1[i] == 'A' and s2[i] == 'C') or (s1[i] == 'G' and s2[i] == 'T') or (s1[i] == 'G' and s2[i] == 'C') or
??????? (s1[i] == 'C' and s2[i] == 'G') or (s1[i] == 'C' and s2[i] == 'A') or (s1[i] == 'T' and s2[i] == 'G') or (s1[i] == 'T' and s2[i] == 'A')):
??????? tv = tv + 1
??? i += 1
per = ti / tv
print(round(per, 11))
?