Redirigiendo al acceso original de articulo en 18 segundos...
Inicio  /  Algorithms  /  Vol: 13 Par: 6 (2020)  /  Artículo
ARTÍCULO
TITULO

Unsupervised Text Feature Selection Using Memetic Dichotomous Differential Evolution

Ibraheem Al-Jadir    
Kok Wai Wong    
Chun Che Fung and Hong Xie    

Resumen

Feature Selection (FS) methods have been studied extensively in the literature, and there are a crucial component in machine learning techniques. However, unsupervised text feature selection has not been well studied in document clustering problems. Feature selection could be modelled as an optimization problem due to the large number of possible solutions that might be valid. In this paper, a memetic method that combines Differential Evolution (DE) with Simulated Annealing (SA) for unsupervised FS was proposed. Due to the use of only two values indicating the existence or absence of the feature, a binary version of differential evolution is used. A dichotomous DE was used for the purpose of the binary version, and the proposed method is named Dichotomous Differential Evolution Simulated Annealing (DDESA). This method uses dichotomous mutation instead of using the standard mutation DE to be more effective for binary purposes. The Mean Absolute Distance (MAD) filter was used as the feature subset internal evaluation measure in this paper. The proposed method was compared with other state-of-the-art methods including the standard DE combined with SA, which is named DESA in this paper, using five benchmark datasets. The F-micro, F-macro (F-scores) and Average Distance of Document to Cluster (ADDC) measures were utilized as the evaluation measures. The Reduction Rate (RR) was also used as an evaluation measure. Test results showed that the proposed DDESA outperformed the other tested methods in performing the unsupervised text feature selection.

 Artículos similares

       
 
Aditya Akundi, Daniel Euresti, Sergio Luna, Wilma Ankobiah, Amit Lopes and Immanuel Edinbarough    
The term Industry 4.0, coined to be the fourth industrial revolution, refers to a higher level of automation for operational productivity and efficiency by connecting virtual and physical worlds in an industry. With Industry 4.0 being unable to address a... ver más

 
Raheel Nawaz, Quanbin Sun, Matthew Shardlow, Georgios Kontonatsios, Naif R. Aljohani, Anna Visvizi and Saeed-Ul Hassan    
Students? evaluation of teaching, for instance, through feedback surveys, constitutes an integral mechanism for quality assurance and enhancement of teaching and learning in higher education. These surveys usually comprise both the Likert scale and free-... ver más
Revista: Applied Sciences

 
Huajie Wang and Yinglin Wang    
The natural language model BERT uses a large-scale unsupervised corpus to accumulate rich linguistic knowledge during its pretraining stage, and then, the information is fine-tuned for specific downstream tasks, which greatly improves the understanding c... ver más
Revista: Information

 
Rami Malkawi, Mohammad Daradkeh, Ammar El-Hassan and Pavel Petrov    
Automated citation analysis is becoming increasingly important in assessing the scientific quality of publications and identifying patterns of collaboration among researchers. However, little attention has been paid to analyzing the scientific content of... ver más
Revista: Information

 
Liuchang Xu, Ruichen Mao, Chengkun Zhang, Yuanyuan Wang, Xinyu Zheng, Xingyu Xue and Fang Xia    
Address matching, which aims to match an input descriptive address with a standard address in an address database, is a key technology for achieving data spatialization. The construction of today?s smart cities depends heavily on the precise matching of ... ver más
Revista: Applied Sciences