Inicio  /  Algorithms  /  Vol: 14 Par: 4 (2021)  /  Artículo
ARTÍCULO
TITULO

An Improved Artificial Bee Colony for Feature Selection in QSAR

Yanhong Lin    
Jing Wang    
Xiaolin Li    
Yuanzi Zhang and Shiguo Huang    

Resumen

Quantitative Structure?Activity Relationship (QSAR) aims to correlate molecular structure properties with corresponding bioactivity. Chance correlations and multicollinearity are two major problems often encountered when generating QSAR models. Feature selection can significantly improve the accuracy and interpretability of QSAR by removing redundant or irrelevant molecular descriptors. An artificial bee colony algorithm (ABC) that mimics the foraging behaviors of honey bee colony was originally proposed for continuous optimization problems. It has been applied to feature selection for classification but seldom for regression analysis and prediction. In this paper, a binary ABC algorithm is used to select features (molecular descriptors) in QSAR. Furthermore, we propose an improved ABC-based algorithm for feature selection in QSAR, namely ABC-PLS-1. Crossover and mutation operators are introduced to employed bee and onlooker bee phase to modify several dimensions of each solution, which not only saves the process of converting continuous values into discrete values, but also reduces the computational resources. In addition, a novel greedy selection strategy which selects the feature subsets with higher accuracy and fewer features helps the algorithm to converge fast. Three QSAR datasets are used for the evaluation of the proposed algorithm. Experimental results show that ABC-PLS-1 outperforms PSO-PLS, WS-PSO-PLS, and BFDE-PLS in accuracy, root mean square error, and the number of selected features. Moreover, we also study whether to implement scout bee phase when tracking regression problems and drawing such an interesting conclusion that the scout bee phase is redundant when dealing with the feature selection in low-dimensional and medium-dimensional regression problems.

 Artículos similares

       
 
Marco Leo, Pierluigi Carcagnì, Luca Signore, Francesco Corcione, Giulio Benincasa, Mikko O. Laukkanen and Cosimo Distante    
Colorectal cancer is one of the most lethal cancers because of late diagnosis and challenges in the selection of therapy options. The histopathological diagnosis of colon adenocarcinoma is hindered by poor reproducibility and a lack of standard examinati... ver más
Revista: AI

 
Waseem Abbas, Zuping Zhang, Muhammad Asim, Junhong Chen and Sadique Ahmad    
In the ever-expanding online fashion market, businesses in the clothing sales sector are presented with substantial growth opportunities. To utilize this potential, it is crucial to implement effective methods for accurately identifying clothing items. T... ver más
Revista: Information

 
Siyao Lu, Rui Xu, Zhaoyu Li, Bang Wang and Zhijun Zhao    
The International Lunar Research Station, to be established around 2030, will equip lunar rovers with robotic arms as constructors. Construction requires lunar soil and lunar rovers, for which rovers must go toward different waypoints without encounterin... ver más
Revista: Aerospace

 
Mohammed Saïd Kasttet, Abdelouahid Lyhyaoui, Douae Zbakh, Adil Aramja and Abderazzek Kachkari    
Recently, artificial intelligence and data science have witnessed dramatic progress and rapid growth, especially Automatic Speech Recognition (ASR) technology based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs). Consequently, new end-to-... ver más
Revista: Aerospace

 
Pablo Caballero, Luis Gonzalez-Abril, Juan A. Ortega and Áurea Simon-Soro    
Endometriosis (EM) is a chronic inflammatory estrogen-dependent disorder that affects 10% of women worldwide. It affects the female reproductive tract and its resident microbiota, as well as distal body sites that can serve as surrogate markers of EM. Cu... ver más
Revista: Algorithms