Inicio  /  Information  /  Vol: 11 Par: 9 (2020)  /  Artículo
ARTÍCULO
TITULO

Modeling the Paraphrase Detection Task over a Heterogeneous Graph Network with Data Augmentation

Rafael T. Anchiêta    
Rogério F. de Sousa and Thiago A. S. Pardo    

Resumen

Paraphrase detection is a Natural-Language Processing (NLP) task that aims at automatically identifying whether two sentences convey the same meaning (even with different words). For the Portuguese language, most of the works model this task as a machine-learning solution, extracting features and training a classifier. In this paper, following a different line, we explore a graph structure representation and model the paraphrase identification task over a heterogeneous network. We also adopt a back-translation strategy for data augmentation to balance the dataset we use. Our approach, although simple, outperforms the best results reported for the paraphrase detection task in Portuguese, showing that graph structures may capture better the semantic relatedness among sentences.

 Artículos similares

       
 
MohammadHossein Reshadi, Wen Li, Wenjie Xu, Precious Omashor, Albert Dinh, Scott Dick, Yuntong She and Michael Lipsett    
Anomaly detection in data streams (and particularly time series) is today a vitally important task. Machine learning algorithms are a common design for achieving this goal. In particular, deep learning has, in the last decade, proven to be substantially ... ver más
Revista: Algorithms

 
Catarina Palma, Artur Ferreira and Mário Figueiredo    
The presence of malicious software (malware), for example, in Android applications (apps), has harmful or irreparable consequences to the user and/or the device. Despite the protections app stores provide to avoid malware, it keeps growing in sophisticat... ver más
Revista: Information

 
Jawaher Alghamdi, Yuqing Lin and Suhuai Luo    
The detection of fake news has emerged as a crucial area of research due to its potential impact on society. In this study, we propose a robust methodology for identifying fake news by leveraging diverse aspects of language representation and incorporati... ver más
Revista: Information

 
Samuel de Oliveira, Oguzhan Topsakal and Onur Toker    
Automated Machine Learning (AutoML) is a subdomain of machine learning that seeks to expand the usability of traditional machine learning methods to non-expert users by automating various tasks which normally require manual configuration. Prior benchmark... ver más
Revista: Information

 
Andrei Paraschiv, Teodora Andreea Ion and Mihai Dascalu    
The advent of online platforms and services has revolutionized communication, enabling users to share opinions and ideas seamlessly. However, this convenience has also brought about a surge in offensive and harmful language across various communication m... ver más
Revista: Information