Inicio  /  Future Internet  /  Vol: 13 Par: 1 (2021)  /  Artículo
ARTÍCULO
TITULO

Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks

Aleksandr Romanov    
Anna Kurtukova    
Alexander Shelupanov    
Anastasia Fedotova and Valery Goncharov    

Resumen

The article explores approaches to determining the author of a natural language text and the advantages and disadvantages of these approaches. The importance of the considered problem is due to the active digitalization of society and reassignment of most parts of the life activities online. Text authorship methods are particularly useful for information security and forensics. For example, such methods can be used to identify authors of suicide notes, and other texts are subjected to forensic examinations. Another area of application is plagiarism detection. Plagiarism detection is a relevant issue both for the field of intellectual property protection in the digital space and for the educational process. The article describes identifying the author of the Russian-language text using support vector machine (SVM) and deep neural network architectures (long short-term memory (LSTM), convolutional neural networks (CNN) with attention, Transformer). The results show that all the considered algorithms are suitable for solving the authorship identification problem, but SVM shows the best accuracy. The average accuracy of SVM reaches 96%. This is due to thoroughly chosen parameters and feature space, which includes statistical and semantic features (including those extracted as a result of an aspect analysis). Deep neural networks are inferior to SVM in accuracy and reach only 93%. The study also includes an evaluation of the impact of attacks on the method on models? accuracy. Experiments show that the SVM-based methods are unstable to deliberate text anonymization. In comparison, the loss in accuracy of deep neural networks does not exceed 20%. Transformer architecture is the most effective for anonymized texts and allows 81% accuracy to be achieved.

 Artículos similares

       
 
Yan Chen and Chunchun Hu    
Accurate prediction of fine particulate matter (PM2.5) concentration is crucial for improving environmental conditions and effectively controlling air pollution. However, some existing studies could ignore the nonlinearity and spatial correlation of time... ver más

 
Ancilon Leuch Alencar, Marcelo Dornbusch Lopes, Anita Maria da Rocha Fernandes, Julio Cesar Santos dos Anjos, Juan Francisco De Paz Santana and Valderi Reis Quietinho Leithardt    
In the current era of social media, the proliferation of images sourced from unreliable origins underscores the pressing need for robust methods to detect forged content, particularly amidst the rapid evolution of image manipulation technologies. Existin... ver más
Revista: Future Internet

 
Benjamin Burrichter, Juliana Koltermann da Silva, Andre Niemann and Markus Quirmbach    
This study employs a temporal fusion transformer (TFT) for predicting overflow from sewer manholes during heavy rainfall events. The TFT utilised is capable of forecasting overflow hydrographs at the manhole level and was tested on a sewer network with 9... ver más
Revista: Hydrology

 
Gulshan Saleem, Usama Ijaz Bajwa, Rana Hammad Raza and Fan Zhang    
Surveillance video analytics encounters unprecedented challenges in 5G and IoT environments, including complex intra-class variations, short-term and long-term temporal dynamics, and variable video quality. This study introduces Edge-Enhanced TempoFuseNe... ver más
Revista: Future Internet

 
Feng Zhou, Shijing Hu, Xin Du, Xiaoli Wan and Jie Wu    
In the current field of disease risk prediction research, there are many methods of using servers for centralized computing to train and infer prediction models. However, this centralized computing method increases storage space, the load on network band... ver más
Revista: Future Internet