【ROSALIND】【練Python,學(xué)生信】41 構(gòu)建距離矩陣

如果第一次閱讀本系列文檔請先移步閱讀【ROSALIND】【練Python,學(xué)生信】00 寫在前面 ?謝謝配合~

題目:
構(gòu)建距離矩陣(Creating a Distance Matrix)
Given: A collection of n (n≤10) DNA strings s1,…,sn of equal length (at most 1 kbp). Strings are given in FASTA format.
所給:不多于十條DNA序列,長度都相等(最多1kb),以FASTA格式給出。
Return: The matrix D corresponding to the p-distance dp on the given strings. As always, note that your answer is allowed an absolute error of 0.001.
需得:所給序列的距離矩陣,允許絕對誤差不超過0.001。
?
測試數(shù)據(jù)
>Rosalind_9499
TTTCCATTTA
>Rosalind_0942
GATTCATTTC
>Rosalind_6568
TTTCCATTTT
>Rosalind_1833
GTTCCATTTA
測試輸出
0.00000 0.40000 0.10000 0.10000
0.40000 0.00000 0.40000 0.30000
0.10000 0.40000 0.00000 0.20000
0.10000 0.30000 0.20000 0.00000
?
生物學(xué)背景
????????構(gòu)造進化樹有多種方法,各有優(yōu)缺點,其中一種是基于距離矩陣的方法。即,用不同類之間的距離來構(gòu)建樹。計算距離的方法也有多種,這里采用Hamming距離,有關(guān)Hamming距離的介紹請見06 DNA序列Hamming距離的計算。
????????對兩個長度相等的DNA序列s1和s2,計算遺傳距離最簡單的方法是算p距離(p-distance),記為dp(s1,s2),是s1和s2之間不同的字符數(shù)占字符總數(shù)的比例。對n個長度相等的DNA序列s1,s2…sn,其距離矩陣為Di,j=d(si,sj)。
?
?
思路
????????這道題我直接用三層循環(huán)實現(xiàn),前兩層將序列兩兩取出,在最后一層循環(huán)依次比較。
?
代碼
def readfasta(lines):
??? """讀入fasta格式文件的函數(shù)"""
??? seq = []
??? index = []
??? seqplast = ""
??? numlines = 0
??? for i in lines:
??????? if '>' in i:
??????????? index.append(i.replace("\n", "").replace(">", ""))
??????????? seq.append(seqplast.replace("\n", ""))
??????????? seqplast = ""
??????????? numlines += 1
??????? else:
??????????? seqplast = seqplast + i.replace("\n", "")
??????????? numlines += 1
??????? if numlines == len(lines):
??????????? seq.append(seqplast.replace("\n", ""))
??? seq = seq[1:]
??? return index, seq
f = open('input.txt', 'r')
lines = f.readlines()
f.close()
[index, seq] = readfasta(lines)
length = len(seq[0]) # 每個序列的字符數(shù)
result = []
for i in range(0, len(seq)): # 以下兩個循環(huán)實現(xiàn)序列兩兩取出
??? tep = []
??? for j in range(0, len(seq)):
??????? value = 0
??????? for k in range(length): # 最內(nèi)層循環(huán)實現(xiàn)兩序列間字符依次比較
??????????? if seq[i][k] != seq[j][k]:
??????????????? value += 1
??????? tep.append(value/length)
??? result.append(tep)
f = open('output.txt', 'a')
for i in range(len(result)): # 按所給格式輸出
??? for j in range(len(result[i])):
??????? f.write(str(result[i][j]))
??????? f.write(' ')
??? f.write('\n')
f.close()