ARTÍCULO
TITULO

Extraction of the Relations among Significant Pharmacological Entities in Russian-Language Reviews of Internet Users on Medications

Alexander Sboev    
Anton Selivanov    
Ivan Moloshnikov    
Roman Rybka    
Artem Gryaznov    
Sanna Sboeva and Gleb Rylkov    

Resumen

Nowadays, the analysis of digital media aimed at prediction of the society?s reaction to particular events and processes is a task of a great significance. Internet sources contain a large amount of meaningful information for a set of domains, such as marketing, author profiling, social situation analysis, healthcare, etc. In the case of healthcare, this information is useful for the pharmacovigilance purposes, including re-profiling of medications. The analysis of the mentioned sources requires the development of automatic natural language processing methods. These methods, in turn, require text datasets with complex annotation including information about named entities and relations between them. As the relevant literature analysis shows, there is a scarcity of datasets in the Russian language with annotated entity relations, and none have existed so far in the medical domain. This paper presents the first Russian-language textual corpus where entities have labels of different contexts within a single text, so that related entities share a common context. therefore this corpus is suitable for the task of belonging to the medical domain. Our second contribution is a method for the automated extraction of entity relations in Russian-language texts using the XLM-RoBERTa language model preliminarily trained on Russian drug review texts. A comparison with other machine learning methods is performed to estimate the efficiency of the proposed method. The method yields state-of-the-art accuracy of extracting the following relationship types: ADR?Drugname, Drugname?Diseasename, Drugname?SourceInfoDrug, Diseasename?Indication. As shown on the presented subcorpus from the Russian Drug Review Corpus, the method developed achieves a mean F1-score of 80.4% (estimated with cross-validation, averaged over the four relationship types). This result is 3.6% higher compared to the existing language model RuBERT, and 21.77% higher compared to basic ML classifiers.

 Artículos similares

       
 
Xianjin He, Min Deng and Guowei Luo    
Building pattern recognition is fundamental to a wide range of downstream applications, such as urban landscape evaluation, social analyses, and map generalization. Although many studies have been conducted, there is still a lack of satisfactory results,... ver más

 
Li Yu, Peiyuan Qiu, Jialiang Gao and Feng Lu    
Knowledge graphs (KGs) are crucial resources for supporting geographical knowledge services. Given the vast geographical knowledge in web text, extraction of geo-entity relations from web text has become the core technology for construction of geographic... ver más

 
Peiyuan Qiu, Jialiang Gao, Li Yu and Feng Lu    
A Geographic Knowledge Graph (GeoKG) links geographic relation triplets into a large-scale semantic network utilizing the semantic of geo-entities and geo-relations. Unfortunately, the sparsity of geo-related information distribution on the web leads to ... ver más

 
Marianna Barrios León, Ruth Illada García    
El desempeño del capital humano se ve influido por múltiples factores; uno de ellos es el desgaste laboral. Esta investigación tiene por objeto identificar las dimensiones fundamentales que participarían en la medición del desgaste percibido por los trab... ver más