VLA-SMILES: Variable-Length-Array SMILES Descriptors in Neural Network-Based QSAR Modeling
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset Description
2.2. Data Encoding: Variable-Length-Array (VLA) SMILES-Based Descriptors
2.3. Theoretical Background: Multilayer Perceptron and Statistical Metrics of the Model Prediction Ability
2.4. Formation of Training and Testing Sets: Method of Rational Splitting
2.5. Training Algorithms
2.6. Statistical Criteria for Predictive Ability of QSAR Models
3. Results
3.1. Comparison of Kennard–Stone and Ranking by Activity Splitting Methodologies
- 1.
- The Kennard–Stone-based train–test splitting was found to be more efficient than ranking by activity for the investigated QSAR models.
- 2.
- The models built on variable-length-array SMILES D1, D2, D4, or D6 showed equivalent prediction when implemented together with the Kennard–Stone partitioning and were in the first group of models with high predictive ability with RMSE not exceeding 0.85. All types of VLA-featured SMILES-based models with ranking by activity partitioning were in the second group of models of low prediction ability.
3.2. Analysis of Predictive Ability Concerning Activation Functions
3.3. MLP Prediction Models with Two Hidden Layers
3.4. Deep Learning, MLP Autoencoder
3.5. Statistical Analysis of QSAR Model Prediction Ability
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
- MLP_Adam_1l.cpp file: MLP model with one hidden layer using Adam optimizer learning algorithm,
- MLP_Adam_2l.cpp file: MLP model with two hidden layers using Adam optimizer learning algorithm,
- DNN_Adam_AutoEncoder.cpp file: MLP model with three hidden layers and AutoEncoder using Adam optimizer learning algorithm,
- MLP_iRPROP-_1l.cpp file: MLP model with one hidden layer using resilient iRPROP learning algorithm,
- MLP_iRPROP-_2l.cpp file: MLP model with two hidden layers using resilient iRPROP learning algorithm,
- MLP_ATransformedBP.cpp file: MLP model with one hidden layer using resilient affine transformed backpropagation learning algorithm ATransformedBP.
- SMILES_LIG.dat, BINARY_SMILES.dat, and pAct.dat: Input files containing ligand structure information (SMILES) and ligand structure information in binary.
Acknowledgments
Conflicts of Interest
References
- Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M.; et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 2019, 18, 463–477. [Google Scholar] [CrossRef]
- Ekins, S.; Puhl, A.C.; Zorn, K.M.; Lane, T.R.; Russo, D.P.; Klein, J.J.; Hickey, A.J.; Clark, A.M. Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater. 2019, 18, 435–441. [Google Scholar] [CrossRef]
- Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Žídek, A.; Nelson, A.W.R.; Bridgland, A.; et al. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. [Google Scholar] [CrossRef]
- Yasonik, J. Multiobjective de novo drug design with recurrent neural networks and nondominated sorting. J. Cheminform. 2020, 12, 14. [Google Scholar] [CrossRef] [PubMed]
- Sakai, M.; Nagayasu, K.; Shibui, N.; Andoh, C.; Takayama, K.; Shirakawa, H.; Kaneko, S. Prediction of pharmacological activities from chemical structures with graph convolutional neural networks. Sci. Rep. 2021, 11, 525. [Google Scholar] [CrossRef] [PubMed]
- Tsou, L.K.; Yeh, S.-H.; Ueng, S.-H.; Chang, C.-P.; Song, J.-S.; Wu, M.-H.; Chang, H.-F.; Chen, S.-R.; Shih, C.; Chen, C.-T.; et al. Comparative study between deep learning and QSAR classifications for TNBC inhibitors and novel GPCR agonist discovery. Sci. Rep. 2020, 10, 16771. [Google Scholar] [CrossRef] [PubMed]
- Cherkasov, A.; Muratov, E.N.; Fourches, D.; Varnek, A.; Baskin, I.I.; Cronin, M.; Dearden, J.; Gramatica, P.; Martin, Y.C.; Todeschini, R.; et al. QSAR Modeling: Where Have You Been? Where Are You Going To? J. Med. Chem. 2014, 57, 4977–5010. [Google Scholar] [CrossRef] [PubMed]
- Reymond, J.-L.; Ruddigkeit, L.; Blum, L.; van Deursen, R. The enumeration of chemical space. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2012, 2, 717–733. [Google Scholar] [CrossRef]
- Wong, C.H.; Siah, K.W.; Lo, A.W. Estimation of clinical trial success rates and related parameters. Biostatistics 2019, 20, 273–286. [Google Scholar] [CrossRef] [PubMed]
- Itskowitz, P.; Tropsha, A. kNearest Neighbors QSAR Modeling as a Variational Problem: Theory and Applications. J. Chem. Inf. Modeling 2005, 45, 777–785. [Google Scholar] [CrossRef]
- Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef]
- Strieth-Kalthoff, F.; Sandfort, F.; Segler, M.H.S.; Glorius, F. Machine learning the ropes: Principles, applications and directions in synthetic chemistry. Chem. Soc. Rev. 2020, 49, 6154–6168. [Google Scholar] [CrossRef]
- Jiménez-Luna, J.; Grisoni, F.; Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2020, 2, 573–584. [Google Scholar] [CrossRef]
- Wu, Z.; Zhu, M.; Kang, Y.; Leung, E.L.-H.; Lei, T.; Shen, C.; Jiang, D.; Wang, Z.; Cao, D.; Hou, T. Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets. Brief. Bioinform. 2021, 22, bbaa321. [Google Scholar] [CrossRef]
- Baskin, I.I.; Palyulin, V.A.; Zefirov, N.S. Neural Networks in Building QSAR Models. In Artificial Neural Networks: Methods and Applications; Livingstone, D.J., Ed.; Humana Press: Totowa, NJ, USA, 2009; pp. 133–154. [Google Scholar]
- Hisaki, T.; Aiba née Kaneko, M.; Yamaguchi, M.; Sasa, H.; Kouzuki, H. Development of QSAR models using artificial neural network analysis for risk assessment of repeated-dose, reproductive, and developmental toxicities of cosmetic ingredients. J. Toxicol. Sci. 2015, 40, 163–180. [Google Scholar] [CrossRef]
- Žuvela, P.; David, J.; Wong, M.W. Interpretation of ANN-based QSAR models for prediction of antioxidant activity of flavonoids. J. Comput. Chem. 2018, 39, 953–963. [Google Scholar] [CrossRef]
- Muratov, E.N.; Bajorath, J.; Sheridan, R.P.; Tetko, I.V.; Filimonov, D.; Poroikov, V.; Oprea, T.I.; Baskin, I.I.; Varnek, A.; Roitberg, A.; et al. QSAR without borders. Chem. Soc. Rev. 2020, 49, 3525–3564. [Google Scholar] [CrossRef]
- Wilamowski, B. Neural network architectures and learning algorithms. IEEE Ind. Electron. Mag. 2009, 3, 56–63. [Google Scholar] [CrossRef]
- Golbraikh, A.; Tropsha, A. Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. J. Comput.-Aided Mol. Des. 2002, 16, 357–369. [Google Scholar] [CrossRef]
- Mauri, A.; Consonni, V.; Todeschini, R. Molecular Descriptors. In Handbook of Computational Chemistry; Springer: Cham, Switzerland, 2016; pp. 1–29. [Google Scholar]
- Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Model. 1988, 28, 31–36. [Google Scholar] [CrossRef]
- Ponzoni, I.; Sebastián-Pérez, V.; Martínez, M.J.; Roca, C.; De la Cruz Pérez, C.; Cravero, F.; Vazquez, G.E.; Páez, J.A.; Díaz, M.F.; Campillo, N.E. QSAR Classification Models for Predicting the Activity of Inhibitors of Beta-Secretase (BACE1) Associated with Alzheimer’s Disease. Sci. Rep. 2019, 9, 9102. [Google Scholar] [CrossRef]
- Zhang, J.; Norinder, U.; Svensson, F. Deep Learning-Based Conformal Prediction of Toxicity. J. Chem. Inf. Model. 2021, 61, 2648–2657. [Google Scholar] [CrossRef]
- Winter, R.; Montanari, F.; Noé, F.; Clevert, D.-A. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem. Sci. 2019, 10, 1692–1701. [Google Scholar] [CrossRef]
- David, L.; Thakkar, A.; Mercado, R.; Engkvist, O. Molecular representations in AI-driven drug discovery: A review and practical guide. J. Cheminform. 2020, 12, 56. [Google Scholar] [CrossRef]
- Nazarova, A.L.; Yang, L.; Liu, K.; Mishra, A.; Kalia, R.K.; Nomura, K.-I.; Nakano, A.; Vashishta, P.; Rajak, P. Dielectric Polymer Property Prediction Using Recurrent Neural Networks with Optimizations. J. Chem. Inf. Model. 2021, 61, 2175–2186. [Google Scholar] [CrossRef]
- Mendez, D.; Gaulton, A.; Bento, A.P.; Chambers, J.; De Veij, M.; Félix, E.; Magariños, M.P.; Mosquera, J.F.; Mutowo, P.; Nowotka, M.; et al. ChEMBL: Towards direct deposition of bioassay data. Nucleic Acids Res. 2019, 47, D930–D940. [Google Scholar] [CrossRef]
- Davies, M.; Nowotka, M.; Papadatos, G.; Dedman, N.; Gaulton, A.; Atkinson, F.; Bellis, L.; Overington, J.P. ChEMBL web services: Streamlining access to drug discovery data and utilities. Nucleic Acids Res. 2015, 43, W612–W620. [Google Scholar] [CrossRef]
- Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graph. Model. 2002, 20, 269–276. [Google Scholar] [CrossRef]
- Alexander, D.L.J.; Tropsha, A.; Winkler, D.A. Beware of R2: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models. J. Chem. Inf. Model. 2015, 55, 1316–1322. [Google Scholar] [CrossRef]
- Kendall, M.G.; Stuart, A. The Advanced Theory of Statistics. Volume 2: Inference Relatsh; Hafner Publishing Company: New York, NY, USA, 1961. [Google Scholar]
- Riedmiller, M.; Braun, H. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In Proceedings of the 1993 IEEE International Conference on Neural Networks, Nagoya, Japan, 25–29 October 1993; pp. 586–591. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Van Haaster, M.C.; McDonough, A.A.; Gurley, S.B. Blood pressure regulation by the angiotensin type 1 receptor in the proximal tubule. Curr. Opin. Nephrol. Hypertens. 2018, 27, 1–7. [Google Scholar] [CrossRef] [PubMed]
- Fatima, N.; Patel, S.N.; Hussain, T. Angiotensin II Type 2 Receptor: A Target for Protection Against Hypertension, Metabolic Dysfunction, and Organ Remodeling. Hypertension 2021, 77, 1845–1856. [Google Scholar] [CrossRef] [PubMed]
- Royea, J.; Lacalle-Aurioles, M.; Trigiani, L.J.; Fermigier, A.; Hamel, E. AT2R’s (Angiotensin II Type 2 Receptor’s) Role in Cognitive and Cerebrovascular Deficits in a Mouse Model of Alzheimer Disease. Hypertension 2020, 75, 1464–1474. [Google Scholar] [CrossRef] [PubMed]
- Bond, J.S. Proteases: History, discovery, and roles in health and disease. J. Biol. Chem. 2019, 294, 1643–1651. [Google Scholar] [CrossRef]
- Sagawa, T.; Inoue, K.-I.; Takano, H. Use of protease inhibitors for the prevention of COVID-19. Prev. Med. 2020, 141, 106280. [Google Scholar] [CrossRef]
- Wang, Y.; Lv, Z.; Chu, Y. HIV protease inhibitors: A review of molecular selectivity and toxicity. HIV/AIDS–Res. Palliat. Care 2015, 7, 95. [Google Scholar] [CrossRef]
- Patel, N.; Huang, X.P.; Grandner, J.M.; Johansson, L.C.; Stauch, B.; McCorvy, J.D.; Liu, Y.; Roth, B.; Katritch, V. Structure-based discovery of potent and selective melatonin receptor agonists. eLife 2020, 9, e53779. [Google Scholar] [CrossRef]
- Sun, W.; Zheng, Y.; Yang, K.; Zhang, Q.; Shah, A.A.; Wu, Z.; Sun, Y.; Feng, L.; Chen, D.; Xiao, Z.; et al. Machine learning–assisted molecular design and efficiency prediction for high-performance organic photovoltaic materials. Sci. Adv. 2019, 5, eaay4275. [Google Scholar] [CrossRef]
- Remington, J.M.; Ferrell, J.B.; Zorman, M.; Petrucci, A.; Schneebeli, S.T.; Li, J. Machine Learning in a Molecular Modeling Course for Chemistry, Biochemistry, and Biophysics Students. Biophys. 2020, 1, 11. [Google Scholar] [CrossRef]
- Doan Tran, H.; Kim, C.; Chen, L.; Chandrasekaran, A.; Batra, R.; Venkatram, S.; Kamal, D.; Lightstone, J.P.; Gurnani, R.; Shetty, P.; et al. Machine-learning predictions of polymer properties with Polymer Genome. J. Appl. Phys. 2020, 128, 171104. [Google Scholar] [CrossRef]
- Arabnia, H.R.; Deligiannidis, L.; Grimaila, M.R.; Hodson, D.D.; Joe, K.; Sekijima, M.; Tinetti, F.G. Advances in Parallel & Distributed Processing, and Applications; Includes all accepted papers of PDPTA, CSC, MSV, GCC 2020; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Segler, M.H.S.; Kogej, T.; Tyrchan, C.; Waller, M.P. Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks. ACS Cent. Sci. 2017, 4, 120–131. [Google Scholar] [CrossRef]
- Li, X.; Fourches, D. SMILES Pair Encoding: A Data-Driven Substructure Tokenization Algorithm for Deep Learning. J. Chem. Inf. Model. 2021, 61, 1560–1569. [Google Scholar] [CrossRef]
- O’Boyle, N.; Dalke, A. DeepSMILES: An Adaptation of SMILES for Use in Machine-Learning of Chemical Structures. ChemRxiv 2018. [Google Scholar] [CrossRef]
- Krenn, M.; Häse, F.; Nigam, A.; Friederich, P.; Aspuru-Guzik, A. Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation. Mach. Learn. Sci. Technol. 2020, 1, 045024. [Google Scholar] [CrossRef]
- Fite, S.; Nitecki, O.; Gross, Z. Custom Tokenization Dictionary, CUSTODI: A General, Fast, and Reversible Data-Driven Representation and Regressor. J. Chem. Inf. Model. 2021, 61, 3285–3291. [Google Scholar] [CrossRef]
- Drefahl, A. CurlySMILES: A chemical language to customize and annotate encodings of molecular and nanodevice structures. J. Cheminform. 2011, 3, 1. [Google Scholar] [CrossRef]
- Toropova, A.P.; Toropov, A.A.; Veselinović, A.M.; Veselinović, J.B.; Leszczynska, D.; Leszczynski, J. Quasi-SMILES as a Novel Tool for Prediction of Nanomaterials′ Endpoints. In Multi-Scale Approaches in Drug Discovery: From Empirical Knowledge to In Silico Experiments and Back; Speck-Planche, A., Ed.; Elsevier: Amsterdam, The Netherlands, 2017; pp. 191–221. [Google Scholar]
- Ropp, P.J.; Kaminsky, J.C.; Yablonski, S.; Durrant, J.D. Dimorphite-DL: An open-source program for enumerating the ionization states of drug-like small molecules. J. Cheminform. 2019, 11, 14. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Desai, M.; Shah, M. An anatomization on breast cancer detection and diagnosis employing multi-layer perceptron neural network (MLP) and Convolutional neural network (CNN). Clin. eHealth 2021, 4, 1–11. [Google Scholar] [CrossRef]
- Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How Powerful are Graph Neural Networks? arXiv 2018, arXiv:1810.00826. [Google Scholar]
- Rácz, A.; Bajusz, D.; Héberger, K. Effect of Dataset Size and Train/Test Split Ratios in QSAR/QSPR Multiclass Classification. Molecules 2021, 26, 1111. [Google Scholar] [CrossRef]
- Tan, J.; Yang, J.; Wu, S.; Chen, G.; Zhao, J. A critical look at the current train/test split in machine learning. arXiv 2021, arXiv:2106.04525. [Google Scholar]
- Puzyn, T.; Mostrag-Szlichtyng, A.; Gajewicz, A.; Skrzyński, M.; Worth, A.P. Investigating the influence of data splitting on the predictive ability of QSAR/QSPR models. Struct. Chem. 2011, 22, 795–804. [Google Scholar] [CrossRef]
- Martin, T.M.; Harten, P.; Young, D.M.; Muratov, E.N.; Golbraikh, A.; Zhu, H.; Tropsha, A. Does Rational Selection of Training and Test Sets Improve the Outcome of QSAR Modeling? J. Chem. Inf. Model. 2012, 52, 2570–2578. [Google Scholar] [CrossRef]
- Ng, W.; Minasny, B.; Malone, B.; Filippi, P. In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra. PeerJ 2018, 6, e5722. [Google Scholar] [CrossRef]
- Snarey, M.; Terrett, N.K.; Willett, P.; Wilton, D.J. Comparison of algorithms for dissimilarity-based compound selection. J. Mol. Graph. Model. 1997, 15, 372–385. [Google Scholar] [CrossRef]
- Kennard, R.W.; Stone, L.A. Computer Aided Design of Experiments. Technometrics 1969, 11, 137–148. [Google Scholar] [CrossRef]
- Golbraikh, A.; Shen, M.; Xiao, Z.; Xiao, Y.-D.; Lee, K.-H.; Tropsha, A. Rational selection of training and test sets for the development of validated QSAR models. J. Comput.-Aided Mol. Des. 2003, 17, 241–253. [Google Scholar] [CrossRef]
- Puggina Bianchesi, N.M.; Romao, E.L.; Lopes, M.F.B.P.; Balestrassi, P.P.; De Paiva, A.P. A Design of Experiments Comparative Study on Clustering Methods. IEEE Access 2019, 7, 167726–167738. [Google Scholar] [CrossRef]
- Gobbi, A.; Giannetti, A.M.; Chen, H.; Lee, M.-L. Atom-Atom-Path similarity and Sphere Exclusion clustering: Tools for prioritizing fragment hits. J. Cheminform. 2015, 7, 11. [Google Scholar] [CrossRef] [PubMed]
- Jain, A.K.; Murty, M.N.; Flynn, P.J. Data clustering. ACM Comput. Surv. 1999, 31, 264–323. [Google Scholar] [CrossRef]
- Pojas, R. Neural Networks; Springer: Berlin/Heidelberg, Germany, 1996. [Google Scholar]
- Van Ooyen, A.; Nienhuis, B. Improving the convergence of the back-propagation algorithm. Neural Netw. 1992, 5, 465–471. [Google Scholar] [CrossRef]
- Hagiwara, K. Regularization learning, early stopping and biased estimator. Neurocomputing 2002, 48, 937–955. [Google Scholar] [CrossRef]
- Zur, R.M.; Jiang, Y.; Pesce, L.L.; Drukker, K. Noise injection for training artificial neural networks: A comparison with weight decay and early stopping. Med. Phys. 2009, 36, 4810–4818. [Google Scholar] [CrossRef]
- Yao, Y.; Rosasco, L.; Caponnetto, A. On Early Stopping in Gradient Descent Learning. Constr. Approx. 2007, 26, 289–315. [Google Scholar] [CrossRef]
- Reed, R.; Marksil, R.J. Neural Smithing; MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
- Igel, C.; Hüsken, M. Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing 2003, 50, 105–123. [Google Scholar] [CrossRef]
- Xinxing, P.; Lee, B.; Chunrong, Z. A comparison of neural network backpropagation algorithms for electricity load forecasting. In Proceedings of the 2013 IEEE International Workshop on Inteligent Energy Systems (IWIES), Vienna, Austria, 14 November 2013; pp. 22–27. [Google Scholar]
- Avan, E.; Sartono, B. Comparison of Backpropagation and Resilient Backpropagation Algorithms in Non-Invasive Blood Glucose Measuring Device. Int. J. Eng. Res. 2017, 8, 153–157. [Google Scholar]
- Yu, S.; Príncipe, J.C. Understanding autoencoders with information theoretic concepts. Neural Netw. 2019, 117, 104–123. [Google Scholar] [CrossRef]
- Sachs, L. Applied Statistics. A Handbook of Techniques; Springer: Berlin/Heidelberg, Germany, 1984; p. 349. [Google Scholar]
Kennard-Stone-Based Train-Test Splitting | |||||||
---|---|---|---|---|---|---|---|
VLA-SMILES format | k = 1 | k = 2 | k = 4 | k = 6 | k = 8 | k = 12 | k = 16 |
Minimum RMSE for testing set | 0.77 | 0.77 | 0.88 | 0.84 | 0.94 | 0.95 | 0.89 |
Minimum RMSE for testing set | 0.82 | 0.79 | 0.84 | 0.94 | 0.93 | 0.99 | 0.93 |
Ranking by Activity-Based Train-Test Splitting | |||||||
VLA-SMILES format | k = 1 | k = 2 | k = 4 | k = 6 | k = 8 | k = 12 | k = 16 |
Minimum RMSE for testing set | 0.87 | 0.95 | 0.88 | 0.87 | 1.02 | 0.87 | 0.94 |
Minimum RMSE for testing set | 1.01 | 1.18 | 1.14 | 1.21 | 1.29 | 1.11 | 1.26 |
VLA-SMILES Representation | |||||||
---|---|---|---|---|---|---|---|
17,016.32 | 4254.08 | 1063.52 | 472.67 | 265.88 | 118.17 | 66.47 | |
17,651.90 | 4668.33 | 1199.30 | 503.09 | 268.03 | 116.56 | 66.47 |
Kennard–Stone-Based Train–Test Splitting | |||||||
---|---|---|---|---|---|---|---|
VLA-SMILES format | k = 1 | k = 2 | k = 4 | k = 6 | k = 8 | k = 12 | k = 16 |
Minimum RMSE for Testing set | 0.81 | 0.80 | 0.85 | 0.96 | 0.90 | 0.98 | 0.91 |
Minimum RMSE for Testing set | 0.84 | 0.80 | 0.84 | 0.93 | 0.90 | 0.96 | 0.95 |
Minimum RMSE for Testing set | 0.84 | 0.82 | 0.84 | 0.93 | 0.93 | 1.02 | 0.90 |
MLP | Two Hidden Layers | ||||||
---|---|---|---|---|---|---|---|
VLA-SMILES format | k = 1 | k = 2 | k = 4 | k = 6 | k = 8 | k = 12 | k = 16 |
Minimum RMSE for Testing set | 0.81 | 0.87 | 0.85 | 0.89 | 0.85 | 0.98 | 0.90 |
Minimum RMSE for Testing set | 0.81 | 0.80 | 0.86 | 1.01 | 0.94 | 0.98 | 0.90 |
VLA-SMILES | |||||||
---|---|---|---|---|---|---|---|
Minimum RMSE for testing set | 0.85 | 0.84 | 0.88 | 0.94 | 0.99 | 1.03 | 0.92 |
VLA-SMILES-Based Representation | |||||||
---|---|---|---|---|---|---|---|
0.58 | 0.58 | 0.44 | 0.57 | 0.48 | 0.47 | 0.47 | |
0.58 | 0.61 | 0.40 | 0.58 | 0.51 | 0.48 | 0.55 | |
0.60 | 0.62 | 0.46 | 0.65 | 0.59 | 0.47 | 0.48 | |
0.95 | 0.93 | 0.98 | 0.89 | 0.85 | 1.00 | 0.98 | |
0.02 | 0.018 | 0.039 | 0.01 | 0.01 | 0.02 | 0.01 | |
0.57 | 0.58 | 0.40 | 0.58 | 0.50 | 0.47 | 0.46 | |
0.56 | 0.56 | 0.36 | 0.57 | 0.49 | 0.45 | 0.45 | |
23.46 | 24.47 | 33.53 | 50.54 | 50.47 | 42.88 | 40.11 | |
7.46 | 7.60 | 8.75 | 8.28 | 9.45 | 8.96 | 9.44 | |
−0.07 | −0.17 | 0.036 | −0.04 | −0.09 | −0.11 | −0.28 | |
0.62 | 0.65 | 0.44 | 0.66 | 0.60 | 0.48 | 0.53 | |
13.27 | 13.41 | 23.62 | 8.69 | 11.06 | 26.07 | 30.49 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nazarova, A.L.; Nakano, A. VLA-SMILES: Variable-Length-Array SMILES Descriptors in Neural Network-Based QSAR Modeling. Mach. Learn. Knowl. Extr. 2022, 4, 715-737. https://doi.org/10.3390/make4030034
Nazarova AL, Nakano A. VLA-SMILES: Variable-Length-Array SMILES Descriptors in Neural Network-Based QSAR Modeling. Machine Learning and Knowledge Extraction. 2022; 4(3):715-737. https://doi.org/10.3390/make4030034
Chicago/Turabian StyleNazarova, Antonina L., and Aiichiro Nakano. 2022. "VLA-SMILES: Variable-Length-Array SMILES Descriptors in Neural Network-Based QSAR Modeling" Machine Learning and Knowledge Extraction 4, no. 3: 715-737. https://doi.org/10.3390/make4030034