Inicio  /  Applied Sciences  /  Vol: 12 Par: 23 (2022)  /  Artículo
ARTÍCULO
TITULO

Applying a Character-Level Model to a Short Arabic Dialect Sentence: A Saudi Dialect as a Case Study

Tahani Alqurashi    

Resumen

Arabic dialect identification (ADI) has recently drawn considerable interest among researchers in language recognition and natural language processing fields. This study investigated the use of a character-level model that is effectively unrestricted in its vocabulary, to identify fine-grained Arabic language dialects in the form of short written text. The Saudi dialects, particularly the four main Saudi dialects across the country, were considered in this study. The proposed ADI approach consists of five main phases, namely dialect data collection, data preprocessing and labelling, character-based feature extraction, deep learning character-based model/classical machine learning character-based models, and model evaluation performance. Several classical machine learning methods, including logistic regression, stochastic gradient descent, variations of the naive Bayes models, and support vector classification, were applied to the dataset. For the deep learning, the character convolutional neural network (CNN) model was adapted with a bidirectional long short-term memory approach. The collected data were tested under various classification tasks, including two-, three- and four-way ADI tasks. The results revealed that classical machine learning algorithms outperformed the CNN approach. Moreover, the use of the term frequency?inverse document frequency, combined with a character n-grams model ranging from unigrams to four-grams achieved the best performance among the tested parameters.

 Artículos similares

       
 
Firas Alghanim, Ibrahim Al-Hurani, Hazem Qattous, Abdullah Al-Refai, Osamah Batiha, Abedalrhman Alkhateeb and Salama Ikki    
Identifying menopause-related breast cancer biomarkers is crucial for enhancing diagnosis, prognosis, and personalized treatment at that stage of the patient?s life. In this paper, we present a comprehensive framework for extracting multiomics biomarkers... ver más
Revista: Algorithms

 
Yongbo Liu, Peng He, Yan Cao, Conghua Zhu and Shitao Ding    
A critical precondition for realizing mechanized transplantation in rice cultivation is the implementation of seedling tray techniques. To augment the efficacy of seeding, a precise evaluation of the quality of rice seedling cultivation in these trays is... ver más
Revista: Applied Sciences

 
Hongfeng Gao, Tiexin Xu, Renlong Li and Chaozhi Cai    
Because the gearbox in transmission systems is prone to failure and the fault signal is not obvious, the fault end cannot be located. In this paper, a gearbox fault diagnosis method grounded on improved complete ensemble empirical mode decomposition with... ver más
Revista: Applied Sciences

 
Sarfaraz Natha, Umme Laila, Ibrahim Ahmed Gashim, Khalid Mahboob, Muhammad Noman Saeed and Khaled Mohammed Noaman    
Brain tumors (BT) represent a severe and potentially life-threatening cancer. Failing to promptly diagnose these tumors can significantly shorten a person?s life. Therefore, early and accurate detection of brain tumors is essential, allowing for appropri... ver más
Revista: Applied Sciences

 
Guoqing Dong, Weirong Li, Zhenzhen Dong, Cai Wang, Shihao Qian, Tianyang Zhang, Xueling Ma, Lu Zou, Keze Lin and Zhaoxia Liu    
The developed prototype provides a more efficient and accurate solution for classifying dynagraph cards, meeting the requirements of oil field operations and enhancing economic benefits and work efficiency.
Revista: Applied Sciences