Redirigiendo al acceso original de articulo en 23 segundos...
Inicio  /  Applied Sciences  /  Vol: 12 Par: 23 (2022)  /  Artículo
ARTÍCULO
TITULO

Applying a Character-Level Model to a Short Arabic Dialect Sentence: A Saudi Dialect as a Case Study

Tahani Alqurashi    

Resumen

Arabic dialect identification (ADI) has recently drawn considerable interest among researchers in language recognition and natural language processing fields. This study investigated the use of a character-level model that is effectively unrestricted in its vocabulary, to identify fine-grained Arabic language dialects in the form of short written text. The Saudi dialects, particularly the four main Saudi dialects across the country, were considered in this study. The proposed ADI approach consists of five main phases, namely dialect data collection, data preprocessing and labelling, character-based feature extraction, deep learning character-based model/classical machine learning character-based models, and model evaluation performance. Several classical machine learning methods, including logistic regression, stochastic gradient descent, variations of the naive Bayes models, and support vector classification, were applied to the dataset. For the deep learning, the character convolutional neural network (CNN) model was adapted with a bidirectional long short-term memory approach. The collected data were tested under various classification tasks, including two-, three- and four-way ADI tasks. The results revealed that classical machine learning algorithms outperformed the CNN approach. Moreover, the use of the term frequency?inverse document frequency, combined with a character n-grams model ranging from unigrams to four-grams achieved the best performance among the tested parameters.

 Artículos similares

       
 
Minxing Dong, Jichao Yang, Yushan Fu, Tengfei Fu, Qing Zhao, Xuelei Zhang, Qinzeng Xu and Wenquan Zhang    
The soft coral order Alcyonacea is a common coral found in the deep sea and plays a crucial role in the deep-sea ecosystem. This study aims to predict the distribution of Alcyonacea in the western Pacific Ocean using four machine learning-based species d... ver más

 
Dongkeun Lee, Chaeog Lim, Sang-jin Oh, Minjoon Kim, Jun Soo Park and Sung-chul Shin    
Capsizing accidents are regarded as marine accidents with a high rate of casualties per accident. Approximately 89% of all such accidents involve small ships (vessels with gross tonnage of less than 10 tons). Stability calculations are critical for asses... ver más

 
Qishun Mei and Xuhui Li    
To address the limitations of existing methods of short-text entity disambiguation, specifically in terms of their insufficient feature extraction and reliance on massive training samples, we propose an entity disambiguation model called COLBERT, which f... ver más
Revista: Information

 
Jawaher Alghamdi, Yuqing Lin and Suhuai Luo    
The detection of fake news has emerged as a crucial area of research due to its potential impact on society. In this study, we propose a robust methodology for identifying fake news by leveraging diverse aspects of language representation and incorporati... ver más
Revista: Information

 
Shifeng Chen, Jialin Wang and Ketai He    
The popularization of the internet and the widespread use of smartphones have led to a rapid growth in the number of social media users. While information technology has brought convenience to people, it has also given rise to cyberbullying, which has a ... ver más
Revista: Information