Inicio  /  Information  /  Vol: 12 Par: 10 (2021)  /  Artículo
ARTÍCULO
TITULO

Topic Modeling for Amharic User Generated Texts

Girma Neshir    
Andreas Rauber and Solomon Atnafu    

Resumen

Topic Modeling is a statistical process, which derives the latent themes from extensive collections of text. Three approaches to topic modeling exist, namely, unsupervised, semi-supervised and supervised. In this work, we develop a supervised topic model for an Amharic corpus. We also investigate the effect of stemming on topic detection on Term Frequency Inverse Document Frequency (TF-IDF) features, Latent Dirichlet Allocation (LDA) features and a combination of these two feature sets using four supervised machine learning tools, that is, Support Vector Machine (SVM), Naive Bayesian (NB), Logistic Regression (LR), and Neural Nets (NN). We evaluate our approach using an Amharic corpus of 14,751 documents of ten topic categories. Both qualitative and quantitative analysis of results show that our proposed supervised topic detection outperforms with an accuracy of 88% by SVM using state-of-the-art-approach TF-IDF word features with the application of the Synthetic Minority Over-sampling Technique (SMOTE) and with no stemming operation. The results show that text features with stemming slightly improve the performance of the topic classifier over features with no stemming.

 Artículos similares

       
 
Aditya Singhal and Vijay Mago    
The use of Twitter by healthcare organizations is an effective means of disseminating medical information to the public. However, the content of tweets can be influenced by various factors, such as health emergencies and medical breakthroughs. In this st... ver más
Revista: Informatics

 
Vikram Puri, Subhra Mondal, Subhankar Das and Vasiliki G. Vrana    
Blockchain and immersive technology are the pioneers in bringing digitalization to tourism, and researchers worldwide are exploring many facets of these techniques. This paper analyzes the various aspects of blockchain technology and its potential use in... ver más
Revista: Informatics

 
David Olson and Bongsug (Kevin) Chae    
This study examined the Security and Exchange Commission (SEC) annual reports of selected logistics firms over the period from 2006 through 2021 for risk management terms. The purpose was to identify which risks are considered most important in supply ch... ver más
Revista: Information

 
James Durham, Sudipta Chowdhury and Ammar Alzarrad    
Effectively harnessing the power of social media data for disaster management requires sophisticated analysis methods and frameworks. This research focuses on understanding the contextual information present in social media posts during disasters and dev... ver más
Revista: Information

 
Abderahman Rejeb, Karim Rejeb and Horst Treiblmaier    
The metaverse represents an immersive digital environment that has garnered significant attention as a result of its potential to revolutionize various industry sectors and its profound societal impact. While academic interest in the metaverse has surged... ver más
Revista: Information