REVISTA
Computation

TODAS

Inicio / Computation / Vol: 10 Par: 11 (2022) / Art�culo

ART�CULO

TITULO

Greedy Texts Similarity Mapping

Aliya Jangabylova

Alexander Krassovitskiy

Rustam Mussabayev and Irina Ualiyeva

Resumen

The documents similarity metric is a substantial tool applied in areas such as determining topic in relation to documents, plagiarism detection, or problems necessary to capture the semantic, syntactic, or structural similarity of texts. Evaluated results of the similarity measure depend on the types of word represented and the problem statement and can be time-consuming. In this paper, we present a problem-independent algorithm of the similarity metric greedy texts similarity mapping (GTSM), which is computationally efficient to be applied for large datasets with any preferred word vectorization models. GTSM maps words in two texts based on a decision rule that evaluates word similarity and their importance to the texts. We compare it with the well-known word mover?s distance (WMD) algorithm in the k-nearest neighbors text classification problem and find that it leads to similar or better results. In the correlation evaluation task of similarity measures with human-judged scores, we demonstrate its higher correlation scores in comparison with WMD and sentence mover?s similarity (SMS) and show that GTSM is a decent alternative for both word-level and sentence-level tasks.

Palabras claves

text similarity - word mover distance - k-nearest neighbors - word embedding - text classification

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 10 Parte: 11 (2022)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Applied Sciences
Information
Algorithms

DOI

https://doi.org/10.3390/computation10110200

Art�culos similares

Uyghur?Kazakh?Kirghiz Text Keyword Extraction Based on Morpheme Segmentation

Acceso

Sardar Parhat, Mutallip Sattar, Askar Hamdulla and Abdurahman Kadir

In this study, based on a morpheme segmentation framework, we researched a text keyword extraction method for Uyghur, Kazakh and Kirghiz languages, which have similar grammatical and lexical structures. In these languages, affixes and a stem are joined t... ver m�s

Revista: Information

Document-Level Event Role Filler Extraction Using Key-Value Memory Network

Acceso

Hao Wang, Miao Li, Jianyong Duan, Li He and Qing Zhang

Previous work has demonstrated that end-to-end neural sequence models work well for document-level event role filler extraction. However, the end-to-end neural network model suffers from the problem of not being able to utilize global information, result... ver m�s

Revista: Applied Sciences

Enhancing Traceability Link Recovery with Fine-Grained Query Expansion Analysis

Acceso

Tao Peng, Kun She, Yimin Shen, Xiangliang Xu and Yue Yu

Requirement traceability links are an essential part of requirement management software and are a basic prerequisite for software artifact changes. The manual establishment of requirement traceability links is time-consuming. When faced with large projec... ver m�s

Revista: Information

On Isotropy of Multimodal Embeddings

Acceso

Kirill Tyshchuk, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev and Alexander Panchenko

Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based ... ver m�s

Revista: Information

Semisupervised Speech Data Extraction from Basque Parliament Sessions and Validation on Fully Bilingual Basque?Spanish ASR

Acceso

Mikel Penagarikano, Amparo Varona, Germ�n Bordel and Luis Javier Rodriguez-Fuentes

In this paper, a semisupervised speech data extraction method is presented and applied to create a new dataset designed for the development of fully bilingual Automatic Speech Recognition (ASR) systems for Basque and Spanish. The dataset is drawn from an... ver m�s

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles