Knowledge-Based Intelligent Text Simplification for Biological Relation Extraction

Jaskaran Gill

Madhu Chetty

Suryani Lim and Jennifer Hallinan

Resumen

Relation extraction from biological publications plays a pivotal role in accelerating scientific discovery and advancing medical research. While vast amounts of this knowledge is stored within the published literature, extracting it manually from this continually growing volume of documents is becoming increasingly arduous. Recently, attention has been focused towards automatically extracting such knowledge using pre-trained Large Language Models (LLM) and deep-learning algorithms for automated relation extraction. However, the complex syntactic structure of biological sentences, with nested entities and domain-specific terminology, and insufficient annotated training corpora, poses major challenges in accurately capturing entity relationships from the unstructured data. To address these issues, in this paper, we propose a Knowledge-based Intelligent Text Simplification (KITS) approach focused on the accurate extraction of biological relations. KITS is able to precisely and accurately capture the relational context among various binary relations within the sentence, alongside preventing any potential changes in meaning for those sentences being simplified by KITS. The experiments show that the proposed technique, using well-known performance metrics, resulted in a 21% increase in precision, with only 25% of sentences simplified in the Learning Language in Logic (LLL) dataset. Combining the proposed method with BioBERT, the popular pre-trained LLM was able to outperform other state-of-the-art methods.

Palabras claves

sentence simplification - named entity recognition - relation extraction - BioBERT - BERN2

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 10 Parte: 4 (2023)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Informatics
Hue University Journal of Science: Techniques and Technology
Transportation Research Procedia

DOI