Redirigiendo al acceso original de articulo en 17 segundos...
ARTÍCULO
TITULO

Development of the quantitative method for automated text content authorship attribution based on the statistical analysis of N-grams distribution

Vasyl Lytvyn    
Victoria Vysotska    
Ihor Budz    
Yaroslav Pelekh    
Nataliia Sokulska    
Roman Kovalchuk    
Lyudmyla Dzyubyk    
Oksana Tereshchuk    
Myroslav Komar    

Resumen

The peculiarities of the application of linguo-statistics technologies for the identification of the style of the author of text content of scientific and technical profile are considered. Quantitative linguistic analysis of a text uses the benefits of content monitoring based on the NLP methods to identify and analyze the set of stop words, keywords, set phrases and to study N-gram. The latter are used in the linguometry methods to determine in per cent if the given text belongs to a particular author. The quantitative method for automatic text content authorship attribution was developed based on statistical analysis of the 3-gram distribution. The approach to the implementation of identification of the author of the text in the Ukrainian language of the scientific and technical profile was proposed. Experimental results of the proposed method to determine the belonging of the analyzed text to a specific author in the presence of the reference text were obtained. Application of the linguo-statistical analysis of the 3-grams to a set of articles will make it possible to form a subset of publications that are similar in linguistic descriptions. Imposing additional conditions in the form of statistical and quantitative analyses (a set of keywords, set expressions, stylometric, linguometric analyses, etc.) on a subset will allow a significant reduction of this subset by specifying the list of the most likely author. For qualitative and effective content analysis when determining the degree of authorship of a particular author, we propose to analyze the reference text and the one under consideration at several stages: linguometric analysis of the coefficients of the diversity of the author's speech, stylometric analysis, analysis of set expressions, linguo-statistical analysis of 3-grams. For automated text processing, not only the frequency of occurrence of a certain category, but also its existence in the studied text in general are important. Quantitative computation makes it possible to draw objective conclusions about the orientation of materials by the number of using the units of analysis in the studied texts. Qualitative analysis does the same, but as a result of the study of whether (and in what context) there is a certain important original category in general

 Artículos similares

       
 
Mengping Huang, Shuai Ma, Jinrong He, Wei Xue, Xueyan Hou, Yuqi Zhang, Xiaofeng Liu, Heping Bai and Ran Li    
Amino acids found in minor coarse cereals are essential for human growth and development and play a crucial role in efficient and rapid quantitative detection. Surface-enhanced Raman spectroscopy (SERS) enables nondestructive, efficient, and rapid sample... ver más
Revista: Applied Sciences

 
Kaiwen Song, Xiujuan Jiang, Tianye Wang, Dengming Yan, Hongshi Xu and Zening Wu    
The uneven spatial and temporal distribution of water resources has consistently been one of the most significant limiting factors for social development in many regions. Furthermore, with the intensification of climate change, this inequality is progres... ver más
Revista: Water

 
Ciprian Moldovan, Sanda Ro?ca, Bogdan Dolean, Raularian Rusu, Cosmina-Daniela Ursu and Titus Man    
Spatial planning decisions form the basis of territorial progress by enhancing the resilience and opportunities for local and regional development. Therefore, decisions made as a result of multidisciplinary studies based on GIS assessment of all involved... ver más
Revista: Applied Sciences

 
Shuqi Zhang, Tong Zhi, Hongbo Zhang, Chiheng Dang, Congcong Yao, Dengrui Mu, Fengguang Lyu, Yu Zhang and Shangdong Liu    
The hydrological series in the Loess Plateau region has exhibited shifts in trend, mean, and/or variance as the environmental conditions have changed, indicating a departure from the assumption of stationarity. As the variations accumulate, the compound ... ver más
Revista: Water

 
Ye Xiao, Yupeng Hu, Jizhao Liu, Yi Xiao and Qianzhen Liu    
Ship trajectory prediction is essential for ensuring safe route planning and to have advanced warning of the dangers at sea. With the development of deep learning, most of the current research has explored advanced prediction methods based on historical ... ver más