深度学习在药物设计与发现中的应用
来源: | 作者:黄牛、李伟 | 发布时间: 2019-03-20 | 4201 次浏览 | 分享到:
阅读原文,点击 此处 

药学学报, 2019, 54(5): 761-767
引用本文:
李伟, 杨金才, 黄牛. 深度学习在药物设计与发现中的应用[J]. 药学学报, 2019, 54(5): 761-767.
LI Wei, YANG Jin-cai, HUANG Niu. Deep learning in drug design and discovery[J]. Acta Pharmaceutica Sinica, 2019, 54(5): 761-767.


深度学习在药物设计与发现中的应用
李伟2, 杨金才1, 黄牛1,3
1. 北京生命科学研究所, 北京 102206;
2. 瑞璞鑫(苏州)生物科技有限公司, 江苏 苏州 215123;
3. 清华大学生物医学交叉研究院, 北京 102206
摘要: 
在新药创制的药物设计与发现所采用的多种技术中,深度学习仍处于初级阶段,但近年来以其独有的特点,开始应用于虚拟化合物库的生成,化合物活性、代谢和毒性的预测,以及有机合成反应预测等多个方面。与传统的机器学习方法相比,深度学习的预测能力无明显优势,但其无需人工归纳总结数据特征,而是具有学习能力,自动提取特征。与基于第一性原理的计算化学相比,深度学习虽然因为对标注明晰的大数据集的依赖,存在泛化能力的不足,但其以原子为中心进行卷积的表征开始助力计算化学。深度学习作为新兴技术发展迅速,不依赖于大量标注数据的非监督学习等方法在逐渐完善,有望能更好地助力新药研发。
关键词:    新药研发      深度学习      机器学习      计算化学      全新药物设计      


Deep learning in drug design and discovery

LI Wei2, YANG Jin-cai1, HUANG Niu1,3

1. National Institute of Biological Sciences, Beijing 102206, China;
2. RPXDs(Suzhou) Biotechnology Co., Ltd., Suzhou 215123, China;
3. Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
Abstract: 
Among various technologies used in drug design and discovery, deep learning is still in its infancy. Recently, deep learning approaches have been rapidly developed and applied to address various problems in drug discovery, including generation of virtual compound library, prediction of compound activity, metabolism and toxicity, and prediction of organic synthesis routes. Compared with the traditional machine learning methods, the prediction power of deep learning did not show significant improvement. However, proactively learning and automatically feature extraction bring advantages for deep learning approaches. Compared to first principle-based computational chemistry methods, deep learning can not be generalized because it depends on large-scale and highquality annotated data sets. But its molecular representation with single-atom atomic environment vectors could be useful for computational chemists. As an emerging technology, deep learning, especially the unsupervised learning method that does not rely on large datasets with labels, is gradually improving. It is expected that someday deep learning method will become practical for drug discovery.
Key words:    drug discovery    deep learning    machine learning    computational chemistry    de-novo design    

参考文献: 
[1] Smietana K, Siatkowski M, Møller M. Trends in clinical success rates[J]. Nat Rev Drug Discov, 2016, 15:379-380.
[2] Mullard A. 2010 FDA drug approvals[J]. Nat Rev Drug Discov, 2011, 10:82-85.
[3] Hughes B. 2007 FDA drug approvals:a year of flux[J]. Nat Rev Drug Discov, 2008, 7:107-109.
[4] DiMasi JA, Grabowski HG, Hansen RW. The cost of drug devel opment[J]. N Engl J Med, 2015, 372:1972.
[5] Avorn J. The $2.6 billion pill——methodologic and policy consider ations[J]. N Engl J Med, 2015, 372:1877-1879.
[6] Mullard A. 2018 FDA drug approvals[J]. Nat Rev Drug Discov, 2019, 18:85-89.
[7] Macarron R, Banks MN, Bojanic D, et al. Impact of highthroughput screening in biomedical research[J]. Nat Rev Drug Discov, 2011, 10:188-195.
[8] Franzini RM, Neri D, Scheuermann J. DNA-encoded chemical libraries:advancing beyond conventional small-molecule libraries[J]. Acc Chem Res, 2014, 47:1247-1255.
[9] Jorgensen WL. The many roles of computation in drug discovery[J]. Science, 2004, 303:1813-1818.
[10] McCarthy J, Minsky M, Rochester N, et al. A proposal for the dartmouth summer research project on artificial intelligence[J]. AI Magazine, 2006, 27:12-14.
[11] Mamoshina P, Vieira A, Putin E, et al. Applications of deep learning in biomedicine[J]. Mol Pharm, 2016, 13:1445-1454.
[12] Wainberg M, Merico D, Delong A, et al. Deep learning in biomedicine[J]. Nat Biotechnol, 2018, 36:829-838.
[13] Smalley E. AI-powered drug discovery captures pharma interest[J]. Nat Biotechnol, 2017, 35:604-605.
[14] Baskin Ⅱ, Winkler D, Tetko IV. A renaissance of neural networks in drug discovery[J]. Expert Opin Drug Discov, 2016, 11:785-795.
[15] Xu YJ, Pei JF. Deep learning for chemoinformatics[J]. Big Data Res, 2017, 3:45-66.
[16] Gawehn E, Hiss JA, Brown JB, et al. Advancing drug discovery via GPU-based deep learning[J]. Expert Opin Drug Discov, 2018, 13:579-582.
[17] Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313:504-507.
[18] LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521:436-444.
[19] Srivastava N, Hinton G, Alex Krizhevsky A, et al. Dropout:a simple way to prevent neural networks from overtting[J]. J Machine Learning Res, 2014, 15:1929-1958.
[20] Geoffrey E, Hinton NS, Alex Krizhevsky, et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. arXiv:1207.0580,2012.
[21] Deng J, Dong W, Socher R, et al. ImageNet:a large-scale hierar chical image database[C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, Florida, U S A, 2009:248-255.
[22] Tetko IV, Engkvist O, Koch U, et al. BIGCHEM:challenges and opportunities for big data analysis in chemistry[J]. Mol Inf, 2016, 35:615-621.
[23] Gaulton A, Hersey A, Nowotka M, et al. The ChEMBL database in 2017[J]. Nucleic Acids Res, 2017, 45:D945-D954.
[24] Rose PW, Prlic A, Altunkaya A, et al. The RCSB protein data bank:integrative view of protein, gene and 3D structural infor mation[J]. Nucleic Acids Res, 2017, 45:D271-D281.
[25] Bray MA, Gustafsdottir SM, Rohban MH, et al. A dataset of images and morphological profiles of 30000 small-molecule treatments using the cell painting assay[J]. Gigascience, 2017, 6:1-5.
[26] Muresan S, Petrov P, Southan C, et al. Making every SAR point count:the development of chemistry connect for the large-scale integration of structure and bioactivity data[J]. Drug Discov Today, 2011, 16:1019-1030.
[27] Santos R, Ursu O, Gaulton A, et al. A comprehensive map of molecular drug targets[J]. Nat Rev Drug Discov, 2017, 16:19-34.
[28] Alteri E, Guizzaro L. Be open about drug failures to speed up research[J]. Nature, 2018, 563:317-319.
[29] Ioannidis JP. Why most published research findings are false[J]. PLoS Med, 2005, 2:e124.
[30] https://clinicaltrials.gov/ct2/show/NCT03194217.
[31] Simm J, Klambauer G, Arany A, et al. Repurposing highthroughput image assays enables biological activity prediction for drug discovery[J]. Cell Chem Biol, 2018, 25:611-618e613.
[32] Aliper A, Plis S, Artemov A, et al. Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data[J]. Mol Pharm, 2016, 13:2524-2530.
[33] Hansch C, Maloney PP, Fujita T, et al. Correlation of biological activity of phenoxyacetic acids with hammett substituent constants and partition coefficients[J]. Nature, 1962, 194:178-180.
[34] Lombardo F, Desai PV, Arimoto R, et al. In silico absorption, distribution, metabolism, excretion, and pharmacokinetics (ADME-PK):utility and best practices. An industry perspective from the international consortium for innovation through quality in pharmaceutical development[J]. J Med Chem, 2017, 60:9097-9113.
[35] Maggiora GM. On outliers and activity cliffs-why QSAR often disappoints[J]. J Chem Inf Model, 2006, 46:1535.
[36] Cherkasov A, Muratov EN, Fourches D, et al. QSAR modeling:where have you been? Where are you going to?[J]. J Med Chem, 2014, 57:4977-5010.
[37] Chen H, Engkvist O, Wang Y, et al. The rise of deep learning in drug discovery[J]. Drug Discov Today, 2018, 23:1241-1250.
[38] Zhang L, Tan J, Han D, et al. From machine learning to deep learning:progress in machine intelligence for rational drug discovery[J]. Drug Discov Today, 2017, 22:1680-1685.
[39] Ma J, Sheridan RP, Liaw A, et al. Deep neural nets as a method for quantitative structure-activity relationships[J]. J Chem Inf Model, 2015, 55:263-274.
[40] Garrett BG, Siegel C, Vishnu A, et al. Chemception:a deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models[J]. arXiv:1706.06689,2017.
[41] Korotcov A, Tkachenko V, Russo DP, et al. Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets[J]. Mol Pharm, 2017, 14:4462-4475.
[42] Russo DP, Zorn KM, Clark AM, et al. Comparing multiple machine learning algorithms and metrics for estrogen receptor binding prediction[J]. Mol Pharm, 2018, 15:4361-4370.
[43] Rodríguez-Pérez R, Miyao T, Jasial S, et al. Prediction of compound profiling matrices using machine learning[J]. ACS Omega, 2018, 3:4713-4723.
[44] Wang L, Wu Y, Deng Y, et al. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field[J]. J Am Chem Soc, 2015, 137:2695-2703.
[45] Jordan AM. Artificial intelligence in drug design-the storm before the calm?[J]. ACS Med Chem Lett, 2018, 9:1150-1152.
[46] Brooks BR, Brooks CL 3rd, Mackerell AD Jr, et al. CHARMM:the biomolecular simulation program[J]. J Comput Chem, 2009, 30:1545-1614.
[47] Eldridge MD, Murray CW, Auton TR, et al. Empirical scoring functions:I. The development of a fast empirical scoring func tion to estimate the binding affinity of ligands in receptor com plexes[J]. J Comput Aided Mol Des, 1997, 11:425-445.
[48] Muegge I, Martin YC. A general and fast scoring function for protein-ligand interactions:a simplified potential approach[J]. J Med Chem, 1999, 42:791-804.
[49] Liu Z, Su M, Han L, et al. Forging the basis for developing protein-ligand interaction scoring functions[J]. Acc Chem Res, 2017, 50:302-309.
[50] Behler J, Parrinello M. Generalized neural-network representa tion of high-dimensional potential-energy surfaces[J]. Phys Rev Lett, 2007, 98:146401.
[51] Smith JS, Isayev O, Roitberg AE. ANI-1:an extensible neural network potential with DFT accuracy at force field computational cost[J]. Chem Sci, 2017, 8:3192-3203.
[52] Huang N, Shoichet BK, Irwin JJ. Benchmarking sets for molecular docking[J]. J Med Chem, 2006, 49:6789-6801.
[53] Wu ZQ, Ramsundar B, Feinberg EN, et al. MoleculeNet:a benchmark for molecular machine learning[J]. Chem Sci, 2018, 9:513-530.
[54] Lewis RA. Automated site-directed drug design:approaches to the formation of 3D molecular graphs[J]. J Comput Aided Mol Des, 1990, 4:205-210.
[55] Schneider G, Wrede P. Artificial neural networks for computerbased molecular design[J]. Prog Biophys Mol Biol, 1998, 70:175-222.
[56] Schneider G, Fechner U. Computer-based de novo design of druglike molecules[J]. Nat Rev Drug Discov, 2005, 4:649-663.
[57] Schneider G. Automating drug discovery[J]. Nat Rev Drug Discov, 2018, 17:97-113.
[58] Segler MHS, Kogej T, Tyrchan C, et al. Generating focused molecule libraries for drug discovery with recurrent neural net works[J]. ACS Cent Sci, 2018, 4:120-131.
[59] Reker D, Rodrigues T, Schneider P, et al. Identifying the macro molecular targets of de novo-designed chemical entities through self -organizing map consensus[J]. Proc Natl Acad Sci U S A, 2014, 111:4067-4072.
[60] Merk D, Friedrich L, Grisoni F, et al. De novo design of bioac tive small molecules by artificial intelligence[J]. Mol Inf, 2018, 37:1700153.
[61] Olivecrona M, Blaschke T, Engkvist O, et al. Molecular de-novo design through deep reinforcement learning[J]. J Cheminform, 2017, 9:48.
[62] Griffen E, Leach AG, Robb GR, et al. Matched molecular pairs as a medicinal chemistry tool[J]. J Med Chem, 2011, 54:7739-7750.
[63] Leach AG, Jones HD, Cosgrove DA, et al. Matched molecular pairs as a guide in the optimization of pharmaceutical properties; a study of aqueous solubility, plasma protein binding and oral exposure[J]. J Med Chem, 2006, 49:6672-6682.
[64] Hajduk PJ, Sauer DR. Statistical analysis of the effects of common chemical substituents on ligand potency[J]. J Med Chem, 2008, 51:553-564.
[65] Wassermann AM, Bajorath J. Chemical substitutions that introduce activity cliffs across different compound classes and biological targets[J]. J Chem Inf Model, 2010, 50:1248-1256.
[66] Turk S, Merget B, Rippmann F, et al. Coupling matched molecular pairs with machine learning for virtual compound optimization[J]. J Chem Inf Model, 2017, 57:3079-3085.
[67] Ekins S. Progress in computational toxicology[J]. J Pharmacol Toxicol Methods, 2014, 69:115-140.
[68] Sushko I, Salmina E, Potemkin VA, et al. ToxAlerts:a web server of structural alerts for toxic chemicals and compounds with potential adverse reactions[J]. J Chem Inf Model, 2012, 52:2310-2316.
[69] Fernandez M, Ban F, Woo G, et al. Toxic colors:the use of deep learning for predicting toxicity of compounds merely from their graphic images[J]. J Chem Inf Model, 2018, 58:1533-1543.
[70] Andersen ME, Krewski D. Toxicity testing in the 21st century:bringing the vision to life[J]. Toxicol Sci, 2009, 107:324-330.
[71] Fraser K, Bruckner DM, Dordick JS. Advancing predictive hepatotoxicity at the intersection of experimental, in silico, and artificial intelligence technologies[J]. Chem Res Toxicol, 2018, 31:412-430.
[72] Hughes TB, Swamidass SJ. Deep learning to predict the forma tion of quinone species in drug metabolism[J]. Chem Res Toxicol, 2017, 30:642-656.
[73] Hughes TB, Miller GP, Swamidass SJ. Modeling epoxidation of drug -like molecules with a deep machine learning network[J]. ACS Cent Sci, 2015, 1:168-180.
[74] Lusci A, Pollastri G, Baldi P. Deep architectures and deep learning in chemoinformatics:the prediction of aqueous solubility for drug-like molecules[J]. J Chem Inf Model, 2013, 53:1563-1575.
[75] Corey EJ, Long AK, Rubenstein SD. Computer-assisted analysis in organic synthesis[J]. Science, 1985, 228:408-418.
[76] Schwaller P, Gaudin T, Lanyi D, et al. "Found in translation":predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models[J]. Chem Sci, 2018, 9:6091-6098.
[77] Segler MHS, Preuss M, Waller MP. Planning chemical syntheses with deep neural networks and symbolic AI[J]. Nature, 2018, 555:604-610.
[78] Lajiness MS, Maggiora GM, Shanmugasundaram V. Assessment of the consistency of medicinal chemists in reviewing sets of compounds[J]. J Med Chem, 2004, 47:4891-4896.
[79] Jing YK, Bian YM, Hu ZH, et al. Deep learning for drug design:an artificial intelligence paradigm for drug discovery in the big data era[J]. AAPS J, 2018, 20:58.
[80] Kadurin A, Nikolenko S, Khrabrov K, et al. druGAN:an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico[J]. Mol Pharm, 2017, 14:3098-3104.
[81] Kuhn M, Letunic I, Jensen LJ, et al. The SIDER database of drugs and side effects[J]. Nucleic Acids Res, 2016, 44:D1075-D1079.
[82] Altae-Tran H, Ramsundar B, Pappu AS, et al. Low data drug discovery with one-shot learning[J]. ACS Cent Sci, 2017, 3:283-293.