Evaluation of Optimal Number of Topics of Topic Model: An Approach Based on the Quality of Clusters

Fedor Krasnov

Resumen

Although topic models have been used to build clusters of documents for more than ten years, there is still a problem of choosing the optimal number of topics. The authors analyzed many fundamental studies undertaken on this subject in recent years. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of the topic model. The authors analyzed the internal metrics of the topic model: Coherence, Contrast, and Purity to determine the optimal number of topics and concluded that they are not applicable to solve this problem. The authors analyzed the approach to choosing the optimal number of topics based on the quality of the clusters. For this purpose, the authors considered the behavior of the cluster validation metrics: Davies Bouldin Index, Silhouette Coefficient and Calinski-Harabaz.The cornerstone of the proposed new method of determining the optimal number of topics based on the following principles: setting up a topic model with additive regularization (ARTM) to separate noise topics; using dense vector representation (GloVe, FastText, Word2Vec); using a cosine measure for the distance in cluster metric that works better on vectors with large dimensions than The Euclidean distance.The methodology developed by the authors for obtaining the optimal number of topics was tested on the collection of scientific articles from the Onepetro library, selected by specific themes. The experiment showed that the method proposed by the authors allows assessing the optimal number of topics for the topic model built on a small collection of English-language documents.

Acceso

P�GINAS

pp. 8 - 15

N�MERO

Volumen: 7 N�mero: 2 Parte: 0 (2019)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Water
Information
Computation

Art�culos similares

Transfer Reinforcement Learning for Combinatorial Optimization Problems

Acceso

Gleice Kelly Barbosa Souza, Samara Oliveira Silva Santos, Andr� Luiz Carvalho Ottoni, Marcos Santos Oliveira, Daniela Carine Ramires Oliveira and Erivelton Geraldo Nepomuceno

Reinforcement learning is an important technique in various fields, particularly in automated machine learning for reinforcement learning (AutoRL). The integration of transfer learning (TL) with AutoRL in combinatorial optimization is an area that requir... ver m�s

Revista: Algorithms

Research on Product Conceptual Design Scheme Configurations from a Designer?User Conflict Perspective

Acceso

Hongyu Shao, Sizhe Pan, Yufei Song and Quanfu Li

In the context of rapid product iteration, design conflicts arise from discrepancies in designers? understanding of user needs, influenced by subjective preferences, behavioural stances, and other factors. This paper proposes a product conceptual design ... ver m�s

Revista: Applied Sciences

Development of Representative Sailing Mode Construction Methodology Using Markov Chain

Acceso

Changjae Moon, Sanghun Jeong, Giltae Roh and Kido Park

The strengthening of regulations such as EEXI, EEDI, and CII on ship emissions is underway. Despite their application, objective comparisons of ships are hindered by diverse navigation patterns and varying velocity regulations in different seas and ports... ver m�s

Revista: Journal of Marine Science and Engineering

Comprehensive Performance Evaluation of Earthquake Search and Rescue Robots Based on Improved FAHP and Radar Chart

Acceso

Liming Li and Zeang Zhao

To effectively enhance the adaptability of earthquake rescue robots in dynamic environments and complex tasks, there is an urgent need for an evaluation method that quantifies their performance and facilitates the selection of rescue robots with optimal ... ver m�s

Revista: Applied Sciences

Framework Based on Simulation of Real-World Message Streams to Evaluate Classification Solutions

Acceso

Wenny Hojas-Mazo, Francisco Maci�-P�rez, Jos� Vicente Bern� Mart�nez, Mailyn Moreno-Espino, Iren Lorenzo Fonseca and Juan Pav�n

Analysing message streams in a dynamic environment is challenging. Various methods and metrics are used to evaluate message classification solutions, but often fail to realistically simulate the actual environment. As a result, the evaluation can produce... ver m�s

Revista: Algorithms

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles