最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊(cè)

【ROSALIND】【練Python,學(xué)生信】34 測(cè)序片段錯(cuò)誤修正

2020-01-24 16:25 作者:未琢  | 我要投稿

如果第一次閱讀本系列文檔請(qǐng)先移步閱讀【ROSALIND】【練Python,學(xué)生信】00 寫(xiě)在前面 ?謝謝配合~

題目:

修正測(cè)序片段中的錯(cuò)誤(Error Correction in Reads)

Given: A collection of up to 1000 reads of equal length (at most 50 bp) in FASTA format. Some of these reads were generated with a single-nucleotide error. For each read s in the dataset, one of the following applies:

s was correctly sequenced and appears in the dataset at least twice (possibly as a reverse complement);

s is incorrect, it appears in the dataset exactly once, and its Hamming distance is 1 with respect to exactly one correct read in the dataset (or its reverse complement).

所給:約1000條等長(zhǎng)的測(cè)序片段,最多不長(zhǎng)過(guò)50bp,以FASTA格式給出。一些片段包含單個(gè)的堿基替換。單個(gè)片段s符合以下兩種情況之一:

????????s是正確的序列,在數(shù)據(jù)集中至少出現(xiàn)兩次(可能以反向互補(bǔ)序列的形式給出);

????????s包含錯(cuò)誤堿基,在數(shù)據(jù)集中只出現(xiàn)一次,與正確序列僅相差一個(gè)堿基。

Return: A list of all corrections in the form "[old read]->[new read]". (Each correction must be a single symbol substitution, and you may return the corrections in any order.)

需得:以錯(cuò)誤序列->修正序列形式給出的列表,可以用任何順序排列。

?

測(cè)試數(shù)據(jù)

>Rosalind_52

TCATC

>Rosalind_44

TTCAT

>Rosalind_68

TCATC

>Rosalind_28

TGAAA

>Rosalind_95

GAGGA

>Rosalind_66

TTTCA

>Rosalind_33

ATCAA

>Rosalind_21

TTGAT

>Rosalind_18

TTTCC

測(cè)試輸出

TTCAT->TTGAT

GAGGA->GATGA

TTTCC->TTTCA

?

生物學(xué)背景

????????二代測(cè)序讀長(zhǎng)很短,錯(cuò)誤率較高且無(wú)法預(yù)測(cè)。當(dāng)發(fā)現(xiàn)有堿基替換時(shí),需要考慮是真正的突變,還是由測(cè)序錯(cuò)誤導(dǎo)致。

?

思路

????????我的思路是用兩重循環(huán)掃描數(shù)據(jù)集,通過(guò)挨個(gè)比較,將數(shù)據(jù)分為正確序列和錯(cuò)誤序列兩部分。輸出的時(shí)候,再用兩重循環(huán),遍歷正確和錯(cuò)誤的部分,得到結(jié)果。

?

代碼

def readfasta(lines):
???
"""讀入fasta格式文件的函數(shù)"""
???
seq = []
??? index = []
??? seqplast =
""
???
numlines = 0
???
for i in lines:
???????
if '>' in i:
??????????? index.append(i.replace(
"\n", "").replace(">", ""))
??????????? seq.append(seqplast.replace(
"\n", ""))
??????????? seqplast =
""
???????????
numlines += 1
???????
else:
??????????? seqplast = seqplast + i.replace(
"\n", "")
??????????? numlines +=
1
???????
if numlines == len(lines):
??????????? seq.append(seqplast.replace(
"\n", ""))
??? seq = seq[
1:]
???
return index, seq


def recom(s):
???
"""輸出原序列反向互補(bǔ)序列的函數(shù)"""
???
re = s[::-1]
??? c =
""
???
for i in re:
???????
if i == 'A':
??????????? c = c +
'T'
???????
elif i == 'G':
??????????? c = c +
'C'
???????
elif i == 'T':
??????????? c = c +
'A'
???????
elif i == 'C':
??????????? c = c +
'G'

???
return c


def hamm(s1, s2):
???
"""計(jì)算漢明距離的函數(shù)"""

???
hs = 0
???
for i in range(len(s1)):
???????
if s1[i] != s2[i]:
??????????? hs +=
1

???
return hs

?

?

f = open('input.txt', 'r')
lines = f.readlines()
f.close()

[index, seq] = readfasta(lines)
recomseq = []
# 存儲(chǔ)反向互補(bǔ)序列
correct = [] # 存儲(chǔ)正確的序列
wrong = [] # 存儲(chǔ)錯(cuò)誤的序列
length = len(seq)
for i in range(length): # 依次取出每個(gè)測(cè)序片段
???
flag = 0
???
recomseq = recom(seq[i]) # 將其轉(zhuǎn)為反向互補(bǔ)序列
???
if seq[i] not in correct and recomseq not in correct: # 如果正反序列都不在已有的正確序列中
???????
for j in range(i+1,len(seq)):
???????????
if hamm(seq[i], seq[j]) == 0: # 如果在還未掃描的讀長(zhǎng)中發(fā)現(xiàn)了相同的序列
???????????????
flag = 1 # flag標(biāo)記是否有相同序列
???????????
elif hamm(recomseq, seq[j]) == 0: # 如果在還未掃描的讀長(zhǎng)中發(fā)現(xiàn)了反向互補(bǔ)序列的相同序列
???????????????
flag = 1
???????
if flag == 1: # 如果有相同序列,這個(gè)讀長(zhǎng)是正確的
???????????
correct.append(seq[i])
???????
else: # 反之,是錯(cuò)誤序列
???????????
wrong.append(seq[i])

f =
open('output.txt', 'a')

for i in wrong:
???
for j in correct: # 依次取出錯(cuò)誤序列,與正確序列挨個(gè)比較
???????
recomseq = recom(j)
???????
if hamm(i, j) == 1: # 如果存在一個(gè)堿基替換
???????????
f.write(i + '->' + j)
??????????? f.write(
'\n')
???????????
# print(i, end='->')
??????????? # print(j)
???????
elif hamm(i, recomseq) == 1:
??????????? f.write(i, +
'->' + recomseq)
??????????? f.write(
'\n')
???????????
# print(i, end='->')
??????????? # print(recomseq)

f.close()


【ROSALIND】【練Python,學(xué)生信】34 測(cè)序片段錯(cuò)誤修正的評(píng)論 (共 條)

分享到微博請(qǐng)遵守國(guó)家法律
萨迦县| 北流市| 巍山| 敦煌市| 沅江市| 定结县| 福鼎市| 宁德市| 海伦市| 万山特区| 景东| 宁强县| 正宁县| 商城县| 格尔木市| 兴安盟| 托克逊县| 斗六市| 北海市| 巴林左旗| 洛川县| 新昌县| 吴旗县| 盱眙县| 阜阳市| 潍坊市| 阿尔山市| 磐安县| 大厂| 池州市| 巴里| 洱源县| 盖州市| 宕昌县| 桓仁| 德安县| 宿松县| 武功县| 富平县| 吴忠市| 乾安县|