Comparative Study for Prediction of Low and High Plasma Protein Binding Drugs by Various Machine Learning-Based Classification Algorithms


  • Jaipur National University, School of Life Sciences, Jaipur, Rajasthan, 302025, India
  • Birla Institute of Applied Sciences, Nainital, Uttarakhand, 263136, India
  • National Centre for Cell Science, Pune, Maharashtra, 411007, India


In the drug discovery path, most drug candidates failed at the early stages due to their pharmacokinetic behavior in the system. Early prediction of pharmacokinetic properties and screening methods can reduce the time and investment for lead discoveries. Plasma protein binding is one of these properties which has a vital role in drug discovery and development. The focus of the current study is to develop a computational model for the classification of Low Plasma Protein Binding (LPPB) and High Plasma Protein Binding (HPPB) drugs using machine learning methods for early screening of molecules through WEKA. Plasma protein binding drugs data was collated from the Drug Bank database where 617 drug candidates were found to interact with plasma proteins, out of which an equal proportion of high and low plasma protein binding drugs were extracted to build a training set of ~300 drugs. The machine learning algorithms were trained with a training set and evaluated by a test set. We also compared various machine learning-based classification algorithms i.e., the Naïve Bayes algorithm, Instance-Based Learner (IBK), multilayer perceptron, and random forest to determine the best model based on accuracy. It was observed that the random forest algorithm-based model outperforms with an accuracy of 99.67% and 0.9933 kappa value on training set and on test set as compared to other classification methods and can predict drug plasma binding capacity in the given data set using the WEKA tool.


Drug Discovery, Machine Learning, Multilayer Perceptron, Pharmacokinetic Plasma Protein Binding, Random Forest

Subject Discipline

Pharmacoinformatics, Pharmaceutical, Bioinformatics, Machine Learning

Full Text:


Bohnert T, Gan LS. Plasma protein binding: from discovery to development. J Pharm Sci. 1 Sep 2013; 102(9): 2953–94.

Chakravarthy SV, Ghosh J. Scale-based clustering using the radial basis function network. IEEE transactions on neural networks. Sep 1996; 7(5): 1250–61. https://doi.


Chauhan AS, Raj U, Varadwaj PK. Prediction of Plasma Protein Binding affinity by support vector machine and artificial neural network. World J Pharma Res. 2014; 3: 432–441.

Grossi E. Non-Linearity in medicine: a problem or an opportunity. BMJ. 2001; 323: 750. bmj.323.7315.750

Howell AJ, Buxton H. 1 network methods for face detection and attentional frames. Neural Process Lett. Jun 2002; 15(3): 197–211.

Han J, Jian P, and Micheline K. Data mining: concepts and techniques. Elsevier, 2011.

Kalmegh SK. Analysis of WEKA data mining algorithm REPTree, Simple Cart, and Random Tree for classification of Indian News. Int J Innov Sci Eng Tech. Feb 2015; 2(2): 438–46

Karthikeyan T, Thangaraju P. Analysis of classification algorithms applied to hepatitis patients. Int J Comput Appl. 1 Jan 2013; 62(15): 25–30.

McEvoy F J, Amigo J M. Using machine learning to classify image features from canine pelvic radiographs: evaluation of partial least squares discriminant analysis and artificial neural network models. Vet Radiol Ultrasound. Mar 2013; 54(2): 122–126.

Patil PH, Thube S, Ratriaparkhi B and Rajeswari K. Analysis of Different Data Mining Tools using Classification, Clustering and association rule mining. Int J Comp Appl. 1 Jan 2014; 93(8): 35–39.

Rana R, Pruthi J. Heart Disease Prediction using Naïve Bayes classification in data mining. Int J Sci Res and Dev. 2014; 2(05): 2321–0613.

Revathi KK, Kavitha KK. Comparison of classification techniques on heart disease Dataset. Int J Adv Res Comp Sci. 2017 Nov 1; 8(9): 276–280. ijarcs.v8i9.4870

Kumar S, Govil S, Kumar V, Kachhawah S and Kothari SL. Classification of 5’ and 3’ untranslated regions in the human transcriptome by machine learning methods. Res J Biotechnol. 1 Dec 2018; 13(12): 47–53.

Sharma TC and Jain M. WEKA Approach for comparative study of classification Algorithm. Int J Adv Res Comp Comm Eng, Apr 2013; 2(4): 1925–31.

Street ME, Grossi E, Volta C, Faleschini E, Bernasconi S. Placental determinants of fetal growth: identification of key factors in the insulin-like growth factor and cytokine systems using artificial neural networks. BMC Pediatr Dec 2008; 8(25): 1–11.

Toma C, Gadaleta D, Roncaglioni A, Toropov A, Toropova A, Marzo M, Benfenati E. QSAR development for plasma protein binding: influence of the ionization state. Pharm Res. Feb 2019; 36(2): 1–9.

Witten IH, Frank E. Data mining: practical machine learning tools and techniques with Java implementations. Sigmod Rec. 1 Mar 2002; 31(1): 76–7.

Zhivkova Z, Doytchinova I. Quantitative structure—plasma protein binding relationships of acidic drugs. J Pharm Sci. 1 Dec 2012;101(12): 4627–41. jps.23303

Tiwari M, Govil S, Kumar S. A Review on Predictive Models and Classification of Inhibitors using Bioinformatics Approach. Int J Pharm Technol Biotechnol. 2015; 2(1): 26–32.

Zhivkova ZD. Quantitative structure–pharmacokinetics relationships for plasma protein binding of basic drugs. J Pharm & Pharm Sci. 2017; 20: 349–59.

Zhu XW, Sedykh A, Zhu H, Liu SS, Tropsha A. The use of pseudo-equilibrium constant affords improved QSAR models of human plasma protein binding. Pharm Res. Jul 2013; 30(7): 1790–8.

Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, Tang A. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 1 Jan 2014; 42(D1): D1091–7.

Cano G, Garcia-Rodriguez J, Garcia-Garcia A, Perez Sanchez H, Benediktsson JA, Thapa A, Barr A. Automatic selection of molecular descriptors using random forest: Application to drug discovery. Expert Syst Appl. 15 Apr 2017; 72: 151–9.

Breiman L. Random forests. Mach Learn. Oct 2001; 45(1): 5–32.

Kaushal, Sharma K., Kumar Shailesh, Singh Brijendra, Bundela Saurabh, Patro Nisha, Patro K. Ishan, and Bisen S. Prakash. "Targeting fatty acid synthase protein by molecular docking studies of naturally occurring ganoderic acid analogues acting as anti-obesity molecule." Res J Biotechnol. July 2019; 14(7): 52-61.

Yuan Y, Chang S, Zhang Z, Li Z, Li S, Xie P, Yau WP, Lin H, Cai W, Zhang Y, Xiang X. A novel strategy for prediction of human plasma protein binding using machine learning techniques. Chemometrics and Intelligent Laboratory Systems. 15 Apr 2020; 199: 103962.

Sun L, Yang H, Li J, Wang T, Li W, Liu G, Tang Y. In silico prediction of compounds binding to human plasma proteins by QSAR models. ChemMedChem. 2018; 13(6): 572–581.

Zhivkova ZD. Quantitative structure–pharmacokinetics relationships for plasma protein binding of basic

drugs. J. Pharm. Pharm. Sci. 2017; 20: 349–359.


  • There are currently no refbacks.