Identification of Propaganda Documents in the News Text Corp?ra

Ravil Mukhamediev    
Olga Filatova    
Kirill Yakunin    


The article demonstrates the possibilities of using topic modeling to identify propaganda in the media. In modern conditions of increasing information confrontation between countries, propaganda and counter-propaganda come to the forefront, since states need to protect their citizens from various informational threats, to ensure their safety, which is a necessary condition for the further development of the state. To achieve this research projects are necessary to test methods for identifying propaganda. One of such projects, focused on the use of artificial intelligence systems in various applied research areas at the intersection of machine learning, natural language processing and social studies, is presented in the article. The described approach for identifying such a semantically fuzzy phenomenon as propaganda is proposed for the first time. The following definition for political propaganda is suggested - a coordinated, systematic informational influence of the subject of propaganda on target audiences to achieve political goals and promote political ideas.The proposed method includes four main stages: formation of corpus sections, calculation of a thematic model of an overall corpus, calculation of imbalance estimates of corpuses for each topic; extrapolation of the imbalance estimates results to all documents. The method was cross-checked on a subsample of 1000 news marked by an expert and showed a fairly high classification result. Harmonic measure score (F1-Score) varies from 0.72 to 0.94 depending on the selected threshold.