Redirigiendo al acceso original de articulo en 17 segundos...
Inicio  /  Applied Sciences  /  Vol: 13 Par: 9 (2023)  /  Artículo
ARTÍCULO
TITULO

Using Multiple Monolingual Models for Efficiently Embedding Korean and English Conversational Sentences

Youngki Park and Youhyun Shin    

Resumen

This paper presents a novel approach for finding the most semantically similar conversational sentences in Korean and English. Our method involves training separate embedding models for each language and using a hybrid algorithm that selects the appropriate model based on the language of the query. For the Korean model, we fine-tuned the KLUE-RoBERTa-small model using publicly available semantic textual similarity datasets and used Principal Component Analysis (PCA) to reduce the resulting embedding vectors. We also selected a highly-performing English embedding model from available SBERT models. We compared our approach to existing multilingual models using both human-generated and large language model-generated conversational datasets. Our experimental results demonstrate that our hybrid approach outperforms state-of-the-art multilingual models in terms of accuracy, elapsed time for sentence embedding, and elapsed time for finding the nearest neighbor, regardless of whether a GPU is used. These findings highlight the potential benefits of training separate embedding models for different languages, particularly for tasks involving finding the most semantically similar conversational sentences. We expect that our approach will be used for diverse natural language processing-related fields, including machine learning education.

 Artículos similares

       
 
Helmia Adita Fitra, Fran Sinatra     Pág. 201 - 205
Small and Medium-Sized Enterprises (SMEs) in Indonesia, which is also known as Usaha Mikro Kecil dan Menengah (UMKM), play a significant role in determining national economic performance. Banana chips are the most well-known product in Bandar Lampung. As... ver más

 
Houaria ABED, Lynda ZAOUI     Pág. 97 - 113
Recent years have witnessed great interest in developing methods for content-based image retrieval (CBIR). Generally, the image search results which are returned by an image search engine contain multiple topics, and organizing the results into different... ver más

 
Lin Guo, Anand Balu Nellippallil, Warren F. Smith, Janet K. Allen and Farrokh Mistree    
When dealing with engineering design problems, designers often encounter nonlinear and nonconvex features, multiple objectives, coupled decision making, and various levels of fidelity of sub-systems. To realize the design with limited computational resou... ver más
Revista: Algorithms

 
Bahaa Yamany, Mahmoud Said Elsayed, Anca D. Jurcut, Nashwa Abdelbaki and Marianne A. Azer    
Ransomware is a type of malicious software that encrypts a victim?s files and demands payment in exchange for the decryption key. It is a rapidly growing and evolving threat that has caused significant damage and disruption to individuals and organizatio... ver más
Revista: Information

 
Javier Domingo-Espiñeira, Oscar Fraile-Martínez, Cielo Garcia-Montero, María Montero, Andrea Varaona, Francisco J. Lara-Abelenda, Miguel A. Ortega, Melchor Alvarez-Mon and Miguel Angel Alvarez-Mon    
Neurological disorders represent the primary cause of disability and the secondary cause of mortality globally. The incidence and prevalence of the most notable neurological disorders are growing rapidly. Considering their social and public perception by... ver más
Revista: Information