【ROSALIND】【練Python,學(xué)生信】21 確定限制酶酶切位點(diǎn)

如果第一次閱讀本系列文檔請(qǐng)先移步閱讀【ROSALIND】【練Python,學(xué)生信】00 寫在前面 ?謝謝配合~

題目:
確定限制酶酶切位點(diǎn)(Locating Restriction Sites)
Given: A DNA string of length at most 1 kbp in FASTA format.
所給:一條長(zhǎng)度不超過(guò)1kb的DNA序列。
Return: The position and length of every reverse palindrome in the string having length between 4 and 12. You may return these pairs in any order.
需得:這條序列上長(zhǎng)度在4-12nt間的回文序列的位置和長(zhǎng)度,以任意順序輸出。
?
測(cè)試數(shù)據(jù)
>Rosalind_24
TCAATGCATGCGGGTCTATATGCAT
測(cè)試輸出
4 6
5 4
6 6
7 4
17 4
18 4
20 6
21 4
?
背景
限制性核酸內(nèi)切酶是可以識(shí)別并附著特定的脫氧核苷酸序列,并在每條鏈中特定部位的兩個(gè)脫氧核糖核苷酸之間的磷酸二酯鍵進(jìn)行切割的一類酶,簡(jiǎn)稱限制酶。限制酶識(shí)別DNA序列中的回文序列,回文序列是一種旋轉(zhuǎn)對(duì)稱結(jié)構(gòu),在軸的兩側(cè)序列相同而反向,特點(diǎn)是在該段的堿基序列的互補(bǔ)鏈之間正讀反讀都相同。
限制酶的生理作用是限制酶降解外源DNA,維護(hù)宿主遺傳穩(wěn)定的保護(hù)機(jī)制。限制酶在幾乎所有細(xì)菌的種、屬中都有發(fā)現(xiàn),是細(xì)菌防御噬菌體侵染的機(jī)制。
?
思路
本題只需抓住回文序列的特點(diǎn)即可,即互補(bǔ)鏈反向閱讀結(jié)果與正鏈相同。因此挨個(gè)掃描序列中長(zhǎng)度在4-12之間的片段,與反向互補(bǔ)序列比較,相同即為酶切位點(diǎn)序列。
?
代碼
def readfasta(lines):
??? seq = []
??? index = []
??? seqplast = ""
??? numlines = 0
??? for i in lines:
??????? if '>' in i:
??????????? index.append(i.replace("\n", "").replace(">", ""))
??????????? seq.append(seqplast.replace("\n", ""))
??????????? seqplast = ""
??????????? numlines += 1
??????? else:
??????????? seqplast = seqplast + i.replace("\n", "")
??????????? numlines += 1
??????? if numlines == len(lines):
??????????? seq.append(seqplast.replace("\n", ""))
??? seq = seq[1:]
??? return index, seq
?
?
f = open(rosalind_revp.txt', 'r')
lines = f.readlines()
f.close()
[index, seq] = readfasta(lines)
seq = seq[0]
?
c = ""
for i in seq:
??? if i == 'A':
??????? c = c + 'T'
??? elif i == 'G':
??????? c = c + 'C'
??? elif i == 'T':
??????? c = c + 'A'
??? elif i == 'C':
??????? c = c + 'G'
rseq = c? # rseq中存儲(chǔ)原序列的互補(bǔ)序列
?
i = 0
p = []
l = []
while i < len(seq)-4+1:? # 逐個(gè)掃描序列,因長(zhǎng)度最小為4,所以到倒數(shù)第四個(gè)即可
??? j = 4
??? if len(seq) - i >= 12:
??????? while j <= 12:? # 當(dāng)未掃描序列還很長(zhǎng)時(shí),只需掃描到開始位點(diǎn)后的12位
??????????? s1 = seq[i:i+j]
??????????? s2 = rseq[i:i+j]
??????????? s2 = s2[::-1]? # 序列反向
??????????? if s1 == s2:? # 滿足回文序列
??????????????? p.append(i+1)
??????????????? l.append(j)
??????????? j += 1
??? else:
??????? while j <= len(seq) - i:? 當(dāng)未掃描序列長(zhǎng)度不足12,掃到序列結(jié)尾
??????????? s1 = seq[i:i+j]
??????????? s2 = rseq[i:i+j]
??????????? s2 = s2[::-1]
??????????? if s1 == s2:
??????????????? p.append(i+1)
??????????????? l.append(j)
??????????? j += 1
??? i += 1
?
f = open('output.txt', 'a')
i = 0
while i < len(p):
??? f.write(str(p[i]) + " " + str(l[i]) + '\n')? # 按格式輸出到文件
??? i += 1
f.close()