REVISTA
Information

TODAS

Inicio / Information / Vol: 14 Par: 5 (2023) / Art�culo

ART�CULO

TITULO

Uyghur?Kazakh?Kirghiz Text Keyword Extraction Based on Morpheme Segmentation

Sardar Parhat

Mutallip Sattar

Askar Hamdulla and Abdurahman Kadir

Resumen

In this study, based on a morpheme segmentation framework, we researched a text keyword extraction method for Uyghur, Kazakh and Kirghiz languages, which have similar grammatical and lexical structures. In these languages, affixes and a stem are joined together to form a word. A stem is a word particle with a notional meaning, while the affixes perform grammatical functions. Because of these derivative properties, the vocabularies used for these languages are huge. Therefore, pre-processing is a necessary step in NLP tasks for Uyghur, Kazakh and Kirghiz. Morpheme segmentation enabled us to remove the suffixes as the auxiliary unit while retaining the meaningful stem and it reduced the dimension of the feature space present in the keyword extraction task for Uyghur, Kazakh and Kirghiz texts. We transformed the morpheme segmentation task into the problem of labeling the morpheme sequences, and we used the Bi-LSTM network to bidirectionally obtain the position feature information of character sequences. We applied CRF to effectively learn the information of the preceding and following label sequences to build a highly accurate Bi-LSTM_CRF morpheme segmentation model, and we prepared morpheme-based experimental text sets by using this model. Subsequently, we used the stem vectors? similarity to modify the TextRank algorithm, subsequent to the training of the stem embedding vector using the Doc2vec algorithm, and then we performed a text keyword extraction experiment. In this experiment, the highest F1 scores of 43.8%, 44% and 43.9% were obtained for three datasets. The experimental results show that the morpheme-based approach provides much better results than the word-based approach, which shows the stem vector similarity weighting is an efficient method for the text keyword extraction task, thus proving the efficiency of morpheme sequence for morphologically derivative languages.

Palabras claves

Uyghur?Kazakh?Kirghiz - keyword extraction - morpheme segmentation - stem extraction - stem vector - TextRank

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 14 Parte: 5 (2023)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Information
Applied Sciences
Aerospace

DOI

https://doi.org/10.3390/info14050283

Art�culos similares

Ernie-Gram BiGRU Attention: An Improved Multi-Intention Recognition Model for Air Traffic Control

Acceso

Weijun Pan, Peiyuan Jiang, Zhuang Wang, Yukun Li and Zhenlong Liao

In recent years, the emergence of large-scale pre-trained language models has made transfer learning possible in natural language processing, which overturns the traditional model architecture based on recurrent neural networks (RNN). In this study, we c... ver m�s

Revista: Aerospace

Blockchain Propels Tourism Industry?An Attempt to Explore Topics and Information in Smart Tourism Management through Text Mining and Machine Learning

Acceso

Vikram Puri, Subhra Mondal, Subhankar Das and Vasiliki G. Vrana

Blockchain and immersive technology are the pioneers in bringing digitalization to tourism, and researchers worldwide are exploring many facets of these techniques. This paper analyzes the various aspects of blockchain technology and its potential use in... ver m�s

Revista: Informatics

State of Industry 5.0?Analysis and Identification of Current Research Trends

Acceso

Aditya Akundi, Daniel Euresti, Sergio Luna, Wilma Ankobiah, Amit Lopes and Immanuel Edinbarough

The term Industry 4.0, coined to be the fourth industrial revolution, refers to a higher level of automation for operational productivity and efficiency by connecting virtual and physical worlds in an industry. With Industry 4.0 being unable to address a... ver m�s

Revista: Applied System Innovation

Results of automatic mining individual fields of personal data operators register

Acceso

P. Yu. Pushkin,A.M. Rusakov P�g. 37 - 47

The work presents the results of mining the records contained in the fields of the register personal data operators "list of actions with personal data" and "period or condition of termination personal data processing" and assessment of their compliance ... ver m�s

Revista: International Journal of Open Information Technologies

A Query Understanding Framework for Earth Data Discovery

Acceso

Yun Li, Yongyao Jiang, Justin C. Goldstein, Lewis J. Mcgibbney and Chaowei Yang

One longstanding complication with Earth data discovery involves understanding a user?s search intent from the input query. Most of the geospatial data portals use keyword-based match to search data. Little attention has focused on the spatial and tempor... ver m�s

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas