REVISTA
Big Data and Cognitive Computing

TODAS

Redirigiendo al acceso original de articulo en 24 segundos...

Inicio / Big Data and Cognitive Computing / Vol: 7 Par: 4 (2023) / Art�culo

ART�CULO

TITULO

Defining Semantically Close Words of Kazakh Language with Distributed System Apache Spark

Dauren Ayazbayev

Andrey Bogdanchikov

Kamila Orynbekova and Iraklis Varlamis

Resumen

This work focuses on determining semantically close words and using semantic similarity in general in order to improve performance in information retrieval tasks. The semantic similarity of words is an important task with many applications from information retrieval to spell checking or even document clustering and classification. Although, in languages with rich linguistic resources, the methods and tools for this task are well established, some languages do not have such tools. The first step in our experiment is to represent the words in a collection in a vector form and then define the semantic similarity of the terms using a vector similarity method. In order to tame the complexity of the task, which relies on the number of word (and, consequently, of the vector) pairs that have to be combined in order to define the semantically closest word pairs, A distributed method that runs on Apache Spark is designed to reduce the calculation time by running comparison tasks in parallel. Three alternative implementations are proposed and tested using a list of target words and seeking the most semantically similar words from a lexicon for each one of them. In a second step, we employ pre-trained multilingual sentence transformers to capture the content semantics at a sentence level and a vector-based semantic index to accelerate the searches. The code is written in MapReduce, and the experiments and results show that the proposed methods can provide an interesting solution for finding similar words or texts in the Kazakh language.

Palabras claves

cosine - natural language processing - semantically close words - Apache Spark - FAISS - vector

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 7 Parte: 4 (2023)

MATERIAS

INFRAESTRUCTURA

REVISTAS SIMILARES

ISPRS International Journal of Geo-Information
Big Data and Cognitive Computing
Future Internet

DOI

https://doi.org/10.3390/bdcc7040160

Art�culos similares

Bag of Geomorphological Words: A Framework for Integrating Terrain Features and Semantics to Support Landform Object Recognition from High-Resolution Digital Elevation Models

Acceso

Xiran Zhou, Xiao Xie, Yong Xue, Bing Xue, Kai Qin and Weijiang Dai

High-resolution digital elevation models (DEMs) and its derivatives (e.g., curvature, slope, aspect) offer a great possibility of representing the details of Earth?s surface in three-dimensional space. Previous research investigations concerning geomorph... ver m�s

Revista: ISPRS International Journal of Geo-Information

Word Sense Disambiguation Using Cosine Similarity Collaborates with Word2vec and WordNet

Acceso

Korawit Orkphol and Wu Yang

Words have different meanings (i.e., senses) depending on the context. Disambiguating the correct sense is important and a challenging task for natural language processing. An intuitive way is to select the highest similarity between the context and sens... ver m�s

Revista: Future Internet

Breast Cancer Diagnosis System Based on Semantic Analysis and Choquet Integral Feature Selection for High Risk Subjects

Acceso

Soumaya Trabelsi Ben Ameur, Dorra Sellami, Laurent Wendling and Florence Cloppet

In this work, we build a computer aided diagnosis (CAD) system of breast cancer for high risk patients considering the breast imaging reporting and data system (BIRADS), mapping main expert concepts and rules. Therefore, a bag of words is built based on ... ver m�s

Revista: Big Data and Cognitive Computing

Modelling Early Word Acquisition through Multiplex Lexical Networks and Machine Learning

Acceso

Massimo Stella

Early language acquisition is a complex cognitive task. Recent data-informed approaches showed that children do not learn words uniformly at random but rather follow specific strategies based on the associative representation of words in the mental lexic... ver m�s

Revista: Big Data and Cognitive Computing

Topic-Specific Emotion Mining Model for Online Comments

Acceso

Xiangfeng Luo and Yawen Yi

Nowadays, massive texts are generated on the web, which contain a variety of viewpoints, attitudes, and emotions for products and services. Subjective information mining of online comments is vital for enterprises to improve their products or services an... ver m�s

Revista: Future Internet

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles