Inicio  /  Applied Sciences  /  Vol: 11 Par: 5 (2021)  /  Artículo
ARTÍCULO
TITULO

AraSenCorpus: A Semi-Supervised Approach for Sentiment Annotation of a Large Arabic Text Corpus

Ali Al-Laith    
Muhammad Shahbaz    
Hind F. Alaskar and Asim Rehmat    

Resumen

At a time when research in the field of sentiment analysis tends to study advanced topics in languages, such as English, other languages such as Arabic still suffer from basic problems and challenges, most notably the availability of large corpora. Furthermore, manual annotation is time-consuming and difficult when the corpus is too large. This paper presents a semi-supervised self-learning technique, to extend an Arabic sentiment annotated corpus with unlabeled data, named AraSenCorpus. We use a neural network to train a set of models on a manually labeled dataset containing 15,000 tweets. We used these models to extend the corpus to a large Arabic sentiment corpus called ?AraSenCorpus?. AraSenCorpus contains 4.5 million tweets and covers both modern standard Arabic and some of the Arabic dialects. The long-short term memory (LSTM) deep learning classifier is used to train and test the final corpus. We evaluate our proposed framework on two external benchmark datasets to ensure the improvement of the Arabic sentiment classification. The experimental results show that our corpus outperforms the existing state-of-the-art systems.

 Artículos similares

       
 
Olga Tushkanova, Diana Levshun, Alexander Branitskiy, Elena Fedorchenko, Evgenia Novikova and Igor Kotenko    
Cyberattacks on cyber-physical systems (CPS) can lead to severe consequences, and therefore it is extremely important to detect them at early stages. However, there are several challenges to be solved in this area; they include an ability of the security... ver más
Revista: Algorithms

 
Kokoy Siti Komariah, Ariana Tulus Purnomo, Ardianto Satriawan, Muhammad Ogin Hasanuddin, Casi Setianingsih and Bong-Kee Sin    
To pursue a healthy lifestyle, people are increasingly concerned about their food ingredients. Recently, it has become a common practice to use an online recipe to select the ingredients that match an individual?s meal plan and healthy diet preference. T... ver más
Revista: Informatics

 
Xuefeng Zhang, Youngsung Kim, Young-Chul Chung, Sangcheol Yoon, Sang-Yong Rhee and Yong Soo Kim    
Large-scale datasets, which have sufficient and identical quantities of data in each class, are the main factor in the success of deep-learning-based classification models for vision tasks. A shortage of sufficient data and interclass imbalanced data dis... ver más
Revista: Applied Sciences

 
Julio Jerison E. Macrohon, Charlyn Nayve Villavicencio, X. Alphonse Inbaraj and Jyh-Horng Jeng    
With the increasing popularity of Twitter as both a social media platform and a data source for companies, decision makers, advertisers, and even researchers alike, data have been so massive that manual labeling is no longer feasible. This research uses ... ver más
Revista: Information

 
Milad Memarzadeh, Ata Akbari Asanjan and Bryan Matthews    
Identifying safety anomalies and vulnerabilities in the aviation domain is a very expensive and time-consuming task. Currently, it is accomplished via manual forensic reviews by subject matter experts (SMEs). However, with the increase in the amount of d... ver más
Revista: Aerospace