ARTÍCULO
TITULO

On the Role of Prepositional Statistics for Genre Identification of Russian texts

O. A. Mitrofanova    
A. D. Moskvina    

Resumen

In this work we investigate the role of statistical data on function words for automatic identification of genre and topical characteristics of Russian texts. We use the ratio of semantically related prepositions as the principal linguistic parameter. We consider seven frequent prepositions which have spatial meaning and also reveal one or more figurative meanings: ??? (under) / ??? (over), ? (in) / ?? (from), ? (to) / ?? (from), ?? (behind) / ????? (in front of), ? (in) / ?? (at), ?? (at) / ? (from). Our research hypothesis claims that coefficients of preposition frequency ratios in the above mentioned pairs may indicate stylistic properties of the texts. We based our research on several corpora representing different genres and topics: general, literary, publicistic, non-literary, oral subcorpora of the Russian National Corpus (RNC), Russian corpora from the Aranea superlarge corpora family, namely, Araneum Russicum Russicum and Araneum Russicum Externum corpora, as well as social media corpus including posts and comments from Facebook and Twitter networks, and literary corpus including texts from Librusec digital library. We verified the hypothesis on the stylistic homogeneity of oral and written speech of social media users, our verification was based on statistical analysis of polysemous prepositions. Experiments proved the significance of ??? (under) / ??? (over) coefficient in style and text type detection, and revealed the role of ? (in) / ?? (from) and ?? (behind) / ????? (in front of) in differentiation of written and oral texts. We obtained evidence on the statistics of preposition occurrence, as well as the information on the semantic content of prepositional phrases, which is of great significance for text style, genre and topic detection. We found out and analyzed the main properties of the use of polysemous prepositions.

 Artículos similares

       
 
Tatiana Litvinova,Ekarerina Ryzhkova     Pág. 32 - 36
A text reflects a range of combinations of individual inter-acting characteristics of its author, both stable (gender, psychological traits, neuropsychological characteristics) and variable (feelings, emotions). It is obvious that it is not in isolation ... ver más

 
A. Galieva,O. Nevzorova     Pág. 85 - 93
This paper discusses main sources and methodology of compiling the Tatar-Russian Socio-Political Dictionary of collocations. The area of collocations within the language system is of particular importance, and the well-known language-specificity of collo... ver más

 
T. A. Litvinova,O. V. Zagorovskaya,O. A. Litvinova     Pág. 58 - 63
Text-based deception detection is presently on the way to gain even more significance as related studies certainly have both theoretical and practical value and a range of applications for police, security, and customs, as well as predatory communication... ver más