Inicio  /  Algorithms  /  Vol: 14 Par: 4 (2021)  /  Artículo
ARTÍCULO
TITULO

An Improved Artificial Bee Colony for Feature Selection in QSAR

Yanhong Lin    
Jing Wang    
Xiaolin Li    
Yuanzi Zhang and Shiguo Huang    

Resumen

Quantitative Structure?Activity Relationship (QSAR) aims to correlate molecular structure properties with corresponding bioactivity. Chance correlations and multicollinearity are two major problems often encountered when generating QSAR models. Feature selection can significantly improve the accuracy and interpretability of QSAR by removing redundant or irrelevant molecular descriptors. An artificial bee colony algorithm (ABC) that mimics the foraging behaviors of honey bee colony was originally proposed for continuous optimization problems. It has been applied to feature selection for classification but seldom for regression analysis and prediction. In this paper, a binary ABC algorithm is used to select features (molecular descriptors) in QSAR. Furthermore, we propose an improved ABC-based algorithm for feature selection in QSAR, namely ABC-PLS-1. Crossover and mutation operators are introduced to employed bee and onlooker bee phase to modify several dimensions of each solution, which not only saves the process of converting continuous values into discrete values, but also reduces the computational resources. In addition, a novel greedy selection strategy which selects the feature subsets with higher accuracy and fewer features helps the algorithm to converge fast. Three QSAR datasets are used for the evaluation of the proposed algorithm. Experimental results show that ABC-PLS-1 outperforms PSO-PLS, WS-PSO-PLS, and BFDE-PLS in accuracy, root mean square error, and the number of selected features. Moreover, we also study whether to implement scout bee phase when tracking regression problems and drawing such an interesting conclusion that the scout bee phase is redundant when dealing with the feature selection in low-dimensional and medium-dimensional regression problems.

 Artículos similares

       
 
Marco Leo, Pierluigi Carcagnì, Luca Signore, Francesco Corcione, Giulio Benincasa, Mikko O. Laukkanen and Cosimo Distante    
Colorectal cancer is one of the most lethal cancers because of late diagnosis and challenges in the selection of therapy options. The histopathological diagnosis of colon adenocarcinoma is hindered by poor reproducibility and a lack of standard examinati... ver más
Revista: AI

 
Yuting Bai, Yijie Niu, Zhiyao Zhao, Xuebo Jin and Xiaoyi Wang    
The phenomenon of algal bloom seriously affects the function of the aquatic ecosystems, damages the landscape of urban river and lakes, and threatens the safety of water use. The introduction of a multi-attribute decision-making method avoids the shortco... ver más
Revista: Water

 
Cuong Ngoc Nguyen, Hing-Wah Chau and Nitin Muttil    
Green roofs (GRs) have been researched for decades, yet their implementation remains constrained due to several reasons, including their limited appeal to policymakers and the public. Biochar, a carbon-rich material, has been recently introduced as an am... ver más
Revista: Water

 
Chaopeng Yang, Jiacai Pan, Kai Wei, Mengjie Lu and Shihao Jia    
Ocean currents make it difficult for unmanned surface vehicles (USVs) to keep a safe distance from obstacles. Effective path planning should adequately consider the effect of ocean currents on USVs. This paper proposes an improved A* algorithm based on a... ver más

 
Pablo Caballero, Luis Gonzalez-Abril, Juan A. Ortega and Áurea Simon-Soro    
Endometriosis (EM) is a chronic inflammatory estrogen-dependent disorder that affects 10% of women worldwide. It affects the female reproductive tract and its resident microbiota, as well as distal body sites that can serve as surrogate markers of EM. Cu... ver más
Revista: Algorithms