Learning Subword Embedding to Improve Uyghur Named-Entity Recognition

Alimu Saimaiti

Lulu Wang and Tuergen Yibulayin

Resumen

Uyghur is a morphologically rich and typical agglutinating language, and morphological segmentation affects the performance of Uyghur named-entity recognition (NER). Common Uyghur NER systems use the word sequence as input and rely heavily on feature engineering. However, semantic information cannot be fully learned and will easily suffer from data sparsity arising from morphological processes when only the word sequence is considered. To solve this problem, we provide a neural network architecture employing subword embedding with character embedding based on a bidirectional long short-term memory network with a conditional random field layer. Our experiments show that subword embedding can effectively enhance the performance of the Uyghur NER, and the proposed method outperforms the model-based word sequence method.

Palabras claves

subword embedding - Uyghur - named-entity recognition - morphological processing - word sequence - natural language processing - deep learning - word-based neural model

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 10 Parte: 4 (2019)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Applied Sciences

DOI