散文網(wǎng) » 科技 »學(xué)習(xí) » 神經(jīng)網(wǎng)絡(luò)模型-預(yù)測藥物靶點

神經(jīng)網(wǎng)絡(luò)模型-預(yù)測藥物靶點

2020-07-26 09:48 作者:python_biology 0人讀過 | 我要投稿

python機(jī)器學(xué)習(xí)-乳腺癌細(xì)胞挖掘（博主親自錄制視頻）https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

藥物靶點

?藥物與機(jī)體生物大分子的結(jié)合部位即藥物靶點。藥物作用靶點涉及受體、酶、離子通道、轉(zhuǎn)運體、免疫系統(tǒng)、基因等。此外，有些藥物通過其理化作用或補充機(jī)體所缺乏的物質(zhì)而發(fā)揮作用。現(xiàn)有藥物中，超過50%的藥物以受體為作用靶點，受體成為最主要和最重要的作用靶點；超過20%的藥物以酶為作用靶點，特別是酶抑制劑，在臨床應(yīng)用中具有特殊地位；6%左右的藥物以離子通道為作用靶點；3%的藥物以核酸為作用靶點；20%藥物的作用靶點尚有待進(jìn)一步研究。

機(jī)器學(xué)習(xí)預(yù)測藥物靶點意義

藥物靶標(biāo)識別是現(xiàn)代新藥研發(fā)的關(guān)鍵,它在藥物毒副作用研究、老藥新用以及個體化治療中都起著十分重要的作用。然而,受到精度、通量和成本的制約,基于生物實驗的傳統(tǒng)藥物靶標(biāo)識別方法通常難以展開。與此同時,隨著信息科學(xué)的迅猛發(fā)展,機(jī)器學(xué)習(xí)、模式識別、數(shù)據(jù)挖掘等智能計算技術(shù)在生物計算領(lǐng)域得到了廣泛的應(yīng)用。在這些技術(shù)的推動下,計算機(jī)輔助的藥物-靶標(biāo)相互作用預(yù)測方法作為一種快速而準(zhǔn)確的藥物靶標(biāo)識別手段,受到越來越多研究者的重視。它能夠利用計算機(jī)的模擬、運算和預(yù)測技術(shù)研究藥物化合物分子與靶標(biāo)蛋白質(zhì)之間的關(guān)系,指導(dǎo)合成新的藥物或修飾已知的藥物結(jié)構(gòu),從而縮短新藥研制時間,減少新藥研制的盲目性并降低研發(fā)成本。因此,作為一種高效而低成本的方法,基于智能計算的藥物-靶標(biāo)相互作用預(yù)測對于靶標(biāo)蛋白確認(rèn)、靶向性藥物開發(fā)以及藥物-靶標(biāo)相互作用網(wǎng)絡(luò)構(gòu)建都具有十分重要的意義。

下面我用神經(jīng)網(wǎng)絡(luò)算法建立預(yù)測藥物靶點的模型

下面是python語言建立模型代碼

# -*- coding: utf-8 -*-

"""

Created on Wed Sep? 5 11:23:58 2018

?

@author: 231469242@qq.com;<br>微信公眾號：pythonEducation

數(shù)據(jù)源與說明文檔

https://www.wildcardconsulting.dk/useful-information/a-deep-tox21-neural-network-with-rdkit-and-keras/

"""

import pandas as pd

import numpy as np

??

#RDkit for fingerprinting and cheminformatics

from rdkit import Chem, DataStructs

from rdkit.Chem import AllChem, rdMolDescriptors

??

#MolVS for standardization and normalization of molecules

import molvs as mv

#Function to get parent of a smiles

def parent(smiles):

?st = mv.Standardizer() #MolVS standardizer

?try:

? mols = st.charge_parent(Chem.MolFromSmiles(smiles))

? return Chem.MolToSmiles(mols)

?except:

? print "%s failed conversion"%smiles

? return "NaN"

??

#Clean and standardize the data

def clean_data(data):

?#remove missing smiles

?data = data[~(data['smiles'].isnull())]

??

?#Standardize and get parent with molvs

?data["smiles_parent"] = data.smiles.apply(parent)

?data = data[~(data['smiles_parent'] == "NaN")]

??

?#Filter small fragents away

?def NumAtoms(smile):

? return Chem.MolFromSmiles(smile).GetNumAtoms()

??

?data["NumAtoms"] = data["smiles_parent"].apply(NumAtoms)

?data = data[data["NumAtoms"] > 3]

?return data

??

#Read the data

data = pd.DataFrame.from_csv('tox21_10k_data_all_pandas.csv')

valdata = pd.DataFrame.from_csv('tox21_10k_challenge_test_pandas.csv')

testdata = pd.DataFrame.from_csv('tox21_10k_challenge_score_pandas.csv')

?

data = clean_data(data)

valdata = clean_data(valdata)

testdata = clean_data(testdata)

?

#Calculate Fingerprints

def morgan_fp(smiles):

?mol = Chem.MolFromSmiles(smiles)

?fp = AllChem.GetMorganFingerprintAsBitVect(mol,3, nBits=8192)

?npfp = np.array(list(fp.ToBitString())).astype('int8')

?return npfp

??

fp = "morgan"

data[fp] = data["smiles_parent"].apply(morgan_fp)

valdata[fp] = valdata["smiles_parent"].apply(morgan_fp)

testdata[fp] = testdata["smiles_parent"].apply(morgan_fp)

#Choose property to model

prop = 'SR-MMP'

??

#Convert to Numpy arrays

X_train = np.array(list(data[~(data[prop].isnull())][fp]))

X_val = np.array(list(valdata[~(valdata[prop].isnull())][fp]))

X_test = np.array(list(testdata[~(testdata[prop].isnull())][fp]))

??

#Select the property values from data where the value of the property is not null and reshape

y_train = data[~(data[prop].isnull())][prop].values.reshape(-1,1)

y_val = valdata[~(valdata[prop].isnull())][prop].values.reshape(-1,1)

y_test = testdata[~(testdata[prop].isnull())][prop].values.reshape(-1,1)

?

#Set network hyper parameters

l1 = 0.000

l2 = 0.016

dropout = 0.5

hidden_dim = 80

??

#Build neural network

model = Sequential()

model.add(Dropout(0.2, input_shape=(X_train.shape[1],)))

for i in range(3):

?wr = WeightRegularizer(l2 = l2, l1 = l1)

?model.add(Dense(output_dim=hidden_dim, activation="relu", W_regularizer=wr))

?model.add(Dropout(dropout))

wr = WeightRegularizer(l2 = l2, l1 = l1)

model.add(Dense(y_train.shape[1], activation='sigmoid',W_regularizer=wr))

??

##Compile model and make it ready for optimization

model.compile(loss='binary_crossentropy', optimizer = SGD(lr=0.005, momentum=0.9, nesterov=True), metrics=['binary_crossentropy'])

#Reduce lr callback

reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.5,patience=50, min_lr=0.00001, verbose=1)

??

#Training

history = model.fit(X_train, y_train, nb_epoch=1000, batch_size=1000, validation_data=(X_val,y_val), callbacks=[reduce_lr])

#Plot Train History

def plot_history(history):

? ? lw = 2

? ? fig, ax1 = plt.subplots()

? ? ax1.plot(history.epoch, history.history['binary_crossentropy'],c='b', label="Train", lw=lw)

? ? ax1.plot(history.epoch, history.history['val_loss'],c='g', label="Val", lw=lw)

? ? plt.ylim([0.0, max(history.history['binary_crossentropy'])])

? ? ax1.set_xlabel('Epochs')

? ? ax1.set_ylabel('Loss')

? ? ax2 = ax1.twinx()

? ? ax2.plot(history.epoch, history.history['lr'],c='r', label="Learning Rate", lw=lw)

? ? ax2.set_ylabel('Learning rate')

? ? plt.legend()

? ? plt.show()

??

plot_history(history)

?

def show_auc(model):

? ? pred_train = model.predict(X_train)

? ? pred_val = model.predict(X_val)

? ? pred_test = model.predict(X_test)

??

? ? auc_train = roc_auc_score(y_train, pred_train)

? ? auc_val = roc_auc_score(y_val, pred_val)

? ? auc_test = roc_auc_score(y_test, pred_test)

? ? print "AUC, Train:%0.3F Test:%0.3F Val:%0.3F"%(auc_train, auc_test, auc_val)

??

? ? fpr_train, tpr_train, _ =roc_curve(y_train, pred_train)

? ? fpr_val, tpr_val, _ = roc_curve(y_val, pred_val)

? ? fpr_test, tpr_test, _ = roc_curve(y_test, pred_test)

??

? ? plt.figure()

? ? lw = 2

? ? plt.plot(fpr_train, tpr_train, color='b',lw=lw, label='Train ROC (area = %0.2f)'%auc_train)

? ? plt.plot(fpr_val, tpr_val, color='g',lw=lw, label='Val ROC (area = %0.2f)'%auc_val)

? ? plt.plot(fpr_test, tpr_test, color='r',lw=lw, label='Test ROC (area = %0.2f)'%auc_test)

??

? ? plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')

? ? plt.xlim([0.0, 1.0])

? ? plt.ylim([0.0, 1.05])

? ? plt.xlabel('False Positive Rate')

? ? plt.ylabel('True Positive Rate')

? ? plt.title('Receiver operating characteristic of %s'%prop)

? ? plt.legend(loc="lower right")

? ? plt.interactive(True)

? ? plt.show()

show_auc(model)

#Compare with a Linear model

from sklearn import linear_model

#prepare scoring lists

fitscores = []

predictscores = []

##prepare a log spaced list of alpha values to test

alphas = np.logspace(-2, 4, num=10)

##Iterate through alphas and fit with Ridge Regression

for alpha in alphas:

? estimator = linear_model.LogisticRegression(C = 1/alpha)

? estimator.fit(X_train,y_train)

? fitscores.append(estimator.score(X_train,y_train))

? predictscores.append(estimator.score(X_val,y_val))

??

#show a plot

import matplotlib.pyplot as plt

ax = plt.gca()

ax.set_xscale('log')

ax.plot(alphas, fitscores,'g')

ax.plot(alphas, predictscores,'b')

plt.xlabel('alpha')

plt.ylabel('Correlation Coefficient')

plt.show()

??

estimator= linear_model.LogisticRegression(C = 1)

estimator.fit(X_train,y_train)

#Predict the test set

y_pred = estimator.predict(X_test)

print roc_auc_score(y_test, y_pred)

結(jié)論：神經(jīng)網(wǎng)絡(luò)算法效果不錯，驗證數(shù)據(jù)的AUC達(dá)到0.78，但模型有過度擬合，需要調(diào)參或嘗試其他算法。歡迎各位學(xué)員參加我的python機(jī)器學(xué)習(xí)生物信息學(xué)系列課，網(wǎng)址為：http://dwz.date/b9vw

標(biāo)簽：

神經(jīng)網(wǎng)絡(luò)模型-預(yù)測藥物靶點的評論 (共條)

愛情散文傷感散文哲理散文優(yōu)美生活隨筆親情唯美句子傷感的句子現(xiàn)代詩歌空間日志經(jīng)典語句愛情句子作文大全

最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

神經(jīng)網(wǎng)絡(luò)模型-預(yù)測藥物靶點

神經(jīng)網(wǎng)絡(luò)模型-預(yù)測藥物靶點的評論 (共條)

你可能也喜歡這些文章

最新發(fā)布的文章

最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

神經(jīng)網(wǎng)絡(luò)模型-預(yù)測藥物靶點

本文作者的其他文章

神經(jīng)網(wǎng)絡(luò)模型-預(yù)測藥物靶點的評論 (共 條)

你可能也喜歡這些文章

最新發(fā)布的文章

神經(jīng)網(wǎng)絡(luò)模型-預(yù)測藥物靶點的評論 (共條)