Inicio  /  Computers  /  Vol: 12 Par: 6 (2023)  /  Artículo
ARTÍCULO
TITULO

Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning

Nasrin Elhassan    
Giuseppe Varone    
Rami Ahmed    
Mandar Gogate    
Kia Dashtipour    
Hani Almoamari    
Mohammed A. El-Affendi    
Bassam Naji Al-Tamimi    
Faisal Albalwy and Amir Hussain    

Resumen

Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing user opinions and applying these to guide choices, making it one of the most popular areas of research in the field of natural language processing. Despite the fact that several languages, including English, have been the subjects of several studies, not much has been conducted in the area of the Arabic language. The morphological complexities and various dialects of the language make semantic analysis particularly challenging. Moreover, the lack of accurate pre-processing tools and limited resources are constraining factors. This novel study was motivated by the accomplishments of deep learning algorithms and word embeddings in the field of English sentiment analysis. Extensive experiments were conducted based on supervised machine learning in which word embeddings were exploited to determine the sentiment of Arabic reviews. Three deep learning algorithms, convolutional neural networks (CNNs), long short-term memory (LSTM), and a hybrid CNN-LSTM, were introduced. The models used features learned by word embeddings such as Word2Vec and fastText rather than hand-crafted features. The models were tested using two benchmark Arabic datasets: Hotel Arabic Reviews Dataset (HARD) for hotel reviews and Large-Scale Arabic Book Reviews (LARB) for book reviews, with different setups. Comparative experiments utilized the three models with two-word embeddings and different setups of the datasets. The main novelty of this study is to explore the effectiveness of using various word embeddings and different setups of benchmark datasets relating to balance, imbalance, and binary and multi-classification aspects. Findings showed that the best results were obtained in most cases when applying the fastText word embedding using the HARD 2-imbalance dataset for all three proposed models: CNN, LSTM, and CNN-LSTM. Further, the proposed CNN model outperformed the LSTM and CNN-LSTM models for the benchmark HARD dataset by achieving 94.69%, 94.63%, and 94.54% accuracy with fastText, respectively. Although the worst results were obtained for the LABR 3-imbalance dataset using both Word2Vec and FastText, they still outperformed other researchers? state-of-the-art outcomes applying the same dataset.

 Artículos similares

       
 
Ali Al-Laith, Muhammad Shahbaz, Hind F. Alaskar and Asim Rehmat    
At a time when research in the field of sentiment analysis tends to study advanced topics in languages, such as English, other languages such as Arabic still suffer from basic problems and challenges, most notably the availability of large corpora. Furth... ver más
Revista: Applied Sciences

 
Enas Elgeldawi, Awny Sayed, Ahmed R. Galal and Alaa M. Zaki    
Machine learning models are used today to solve problems within a broad span of disciplines. If the proper hyperparameter tuning of a machine learning classifier is performed, significantly higher accuracy can be obtained. In this paper, a comprehensive ... ver más
Revista: Informatics

 
Abdullah Y. Muaad, Hanumanthappa Jayappa, Mugahed A. Al-antari and Sungyoung Lee    
Arabic text classification is a process to simultaneously categorize the different contextual Arabic contents into a proper category. In this paper, a novel deep learning Arabic text computer-aided recognition (ArCAR) is proposed to represent and recogni... ver más
Revista: Algorithms

 
Imane GUELLIL, Marcelo Mendoza, Faical Azouaou     Pág. 124 - 135

 
Essam Kazem Al-Yasiri,Ahmed Al-Azawei    
Regardless of the clear growth of Arabic texts on social networking sites (SNSs), it is still difficult to understand or summarize users' opinions or perspectives on a specific topic. Accordingly, Arabic text classification is one of the most challenging... ver más