Inicio  /  Information  /  Vol: 12 Par: 10 (2021)  /  Artículo
ARTÍCULO
TITULO

Topic Modeling for Amharic User Generated Texts

Girma Neshir    
Andreas Rauber and Solomon Atnafu    

Resumen

Topic Modeling is a statistical process, which derives the latent themes from extensive collections of text. Three approaches to topic modeling exist, namely, unsupervised, semi-supervised and supervised. In this work, we develop a supervised topic model for an Amharic corpus. We also investigate the effect of stemming on topic detection on Term Frequency Inverse Document Frequency (TF-IDF) features, Latent Dirichlet Allocation (LDA) features and a combination of these two feature sets using four supervised machine learning tools, that is, Support Vector Machine (SVM), Naive Bayesian (NB), Logistic Regression (LR), and Neural Nets (NN). We evaluate our approach using an Amharic corpus of 14,751 documents of ten topic categories. Both qualitative and quantitative analysis of results show that our proposed supervised topic detection outperforms with an accuracy of 88% by SVM using state-of-the-art-approach TF-IDF word features with the application of the Synthetic Minority Over-sampling Technique (SMOTE) and with no stemming operation. The results show that text features with stemming slightly improve the performance of the topic classifier over features with no stemming.

 Artículos similares

       
 
Jiangyue Wu, Jiangping Zhou     Pág. 437 - 468
Considered a total social phenomenon, mobility is at the center of intricate social dynamics in cities and serves as a reading lens to understand the whole society. With the advent of big data, the potential for using mobility as a key social analyzer wa... ver más

 
Gabriele Papadia, Massimo Pacella, Massimiliano Perrone and Vincenzo Giliberti    
The paper deals with the analysis of conversation transcriptions between customers and agents in a call center of a customer care service. The objective is to support the analysis of text transcription of human-to-human conversations, to obtain reports o... ver más
Revista: Algorithms

 
Huaqing Cheng, Shengquan Liu, Weiwei Sun and Qi Sun    
Topic models can extract consistent themes from large corpora for research purposes. In recent years, the combination of pretrained language models and neural topic models has gained attention among scholars. However, this approach has some drawbacks: in... ver más
Revista: Applied Sciences

 
Yusung An, Hayoung Oh and Joosik Lee    
The feedback shared by consumers on e-commerce platforms holds immense value in marketing, as it offers insights into their opinions and preferences, which are readily accessible. However, analyzing a large volume of reviews manually is impractical. Ther... ver más
Revista: Applied Sciences

 
David Olson and Bongsug (Kevin) Chae    
This study examined the Security and Exchange Commission (SEC) annual reports of selected logistics firms over the period from 2006 through 2021 for risk management terms. The purpose was to identify which risks are considered most important in supply ch... ver más
Revista: Information