最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會員登陸 & 注冊

STMLST: Serotype Identification By Multi-loci Sequence Typing

2023-07-11 12:06 作者:抗黑眼圈斗士  | 我要投稿

2????? Methods

The analysis procedure of STMLST is depicted in Figure 1. STMLST firstly reformats the input file to FASTA format and maps the input sequences against a alleles sequences database. After parsing the mapping result, STMLST obtains the formatted data that could be used to identify a list of pertinent organisms. STMLST records a high score to an organization if the “Subject sequence length”, “Alignment length” and “Number of identical matches” in the formatted data are equal, and a low score if they are not. At this point, we can get a list containing the organisms and the corresponding scores. The above operations are based on the following principle: if the input sequences have high similarity to the alleles of an organism, the input sequences have a high probability of belonging to that organism. STMLST uses the information of organism with the highest score to construct a search statement and searches the sequence type and serotype database with this statement. Finally, STMLST outputs the subtyping result of the input sequences. Detailed data collection and algorithmic explanation of STMLST are in Methods.

2.1 Data Preprocessing

The data required to run the full functionality of STMLST is divided into three parts: a key alleles database for finding similar key alleles, a sequence type database for finding sequence types based on key alleles, and a serotype database for finding the corresponding serotypes based on sequence types. All three types of data are downloaded from PUBMLST, and the relationship between them is shown in Figure 2. The key alleles database consists of downloaded key alleles from more than one hundred organisms. We write local scripts to download these gene sequences and build a blast index to find similar key alleles by fast alignment. The sequence type database is used to store the mapping of different combinations of key alleles to the sequence types of the organism, which we download and store in the SQLite database using a local script. There is a non-one-to-one mapping relationship between serotypes and sequence types, which we extract from PUBMLST and store in the SQLite database.

Fig. 2.?The data preprocessing for subtyping.



2.2 Identification Strategy

?

We first align the input sequenced sequences with the key alleles database, record the key gene sequences that are successfully aligned with the input sequenced sequences, and mark the records of key alleles into three states according to the different degrees of similarity of the alignment. After all markers were recorded, each candidate organism is given a score based on the marker results. The rules for scoring are shown in Figure 3. According to Equation 2, x represents the number of different alleles that are similar for a given allele, the more the better thus the higher the final score f. θ represents the weights corresponding to different degrees of similarity, with larger θ representing greater similarity and thus higher final score f. The calculated s is the score corresponding to the alignment result of one allele of the organism. According to Equation 2, after accumulating the scores of all alleles to obtain the final score f, STMLST obtained the most likely organism to which the input sequencing data belongs. This organism is then searched in the sequence type database using the key alleles in the records, and the sequence type to which the input sequencing data may belong is obtained based on the mapping relationship between the key alleles and the sequence types. Finally, the possible serotypes are obtained by searching in the serotype database based on the sequence type. Since the data on serotype identification is not yet complete, we have combined it with SeqSero2 as a supplement. We import the serotype identification results of SeqSero2 as a supplement when the data is not sufficient resulting in a null result for Salmonella serotype identification. This measure combines the advantages of two different implementations of the subtyping and could effectively improve identification accuracy.


?s%3D1%2F(1%2Be%5E%7B%CE%B8x%7D%20)? ?(1)

f%3D(100-%5Csum%5Cnolimits_%7Ballele%7D%5E%7Balleles%7D%20s)*w

?

???????????????????? ??(2)

?

Fig. 3.?The scoring process for organisms.



STMLST: Serotype Identification By Multi-loci Sequence Typing的評論 (共 條)

分享到微博請遵守國家法律
五峰| 瑞安市| 郯城县| 开化县| 惠东县| 宁南县| 启东市| 淅川县| 仁化县| 淳安县| 盐津县| 汉沽区| 东乡族自治县| 新乐市| 织金县| 太白县| 汾阳市| 乐东| 祁阳县| 温州市| 凌云县| 永嘉县| 若尔盖县| 芒康县| 确山县| 屯留县| 大足县| 晋江市| 大丰市| 石楼县| 正宁县| 集贤县| 甘洛县| 中阳县| 忻州市| 五河县| 南皮县| 襄城县| 大姚县| 宣化县| 青阳县|