ARTÍCULO
TITULO

LiveJournal topic models and their improvement with contextualized representations for creating a model of hidden communities

Ivan Mamaev    
Olga Mitrofanova    

Resumen

Social networks reflect contemporary tendencies in our society. These tendencies allow users to form communities that have both explicit and hidden links. The latter one is of current interest among scholars. Despite the effectiveness of modern algorithms, they do not take linguistic parameters of datasets into account. This gap can be filled by an algorithm that combines linguistic and quantitative data analysis. The purpose of the study is to detect hidden links among users? posts of the Russian segment of LiveJournal with the help of topic modeling procedures. The current size of the corpus is more than 95,490 posts (132 users). The procedure for constructing a model of hidden communities contains several stages. The first step is to process the corpus data using the Stanza library, which provides a single process of tokenization and lemmatization of social network posts and the removal of manually selected stopwords. The second step is creating contextualized topic models and their manual annotation. The final step is to build a semantic network of users using Easy Linavis and Gephi. The resultant model of hidden communities is represented as a group of vertices connected by edges. The results of the study provide new information about possible social groups in the Russian segment of social networks that can further be analyzed in terms of linguistics.

 Artículos similares