Inicio  /  Future Internet  /  Vol: 11 Par: 1 (2019)  /  Artículo
ARTÍCULO
TITULO

Improved Arabic?Chinese Machine Translation with Linguistic Input Features

Fares Aqlan    
Xiaoping Fan    
Abdullah Alqwbani and Akram Al-Mansoub    

Resumen

This study presents linguistically augmented models of phrase-based statistical machine translation (PBSMT) using different linguistic features (factors) on the top of the source surface form. The architecture addresses two major problems occurring in machine translation, namely the poor performance of direct translation from a highly-inflected and morphologically complex language into morphologically poor languages, and the data sparseness issue, which becomes a significant challenge under low-resource conditions. We use three factors (lemma, part-of-speech tags, and morphological features) to enrich the input side with additional information to improve the quality of direct translation from Arabic to Chinese, considering the importance and global presence of this language pair as well as the limitation of work on machine translation between these two languages. In an effort to deal with the issue of the out of vocabulary (OOV) words and missing words, we propose the best combination of factors and models based on alternative paths. The proposed models were compared with the standard PBSMT model which represents the baseline of this work, and two enhanced approaches tokenized by a state-of-the-art external tool that has been proven to be useful for Arabic as a morphologically rich and complex language. The experiment was performed with a Moses decoder on freely available data extracted from a multilingual corpus from United Nation documents (MultiUN). Results of a preliminary evaluation in terms of BLEU scores show that the use of linguistic features on the Arabic side considerably outperforms baseline and tokenized approaches, the system can consistently reduce the OOV rate as well.

 Artículos similares

       
 
Qingyan Wang, Longzhi Sun and Xuan Yang    
Rice yield is essential to global food security under increasingly frequent and severe climate change events. Spatial analysis of rice yields becomes more critical for regional action to ensure yields and reduce climate impacts. However, the understandin... ver más

 
Mohammed Baljon and Sunil Kumar Sharma    
Every farmer requires access to rainfall prediction (RP) to continue their exploration of harvest yield. The proper use of water assets, the successful collection of water, and the successful pre-growth of water construction all depend on an accurate ass... ver más
Revista: Water

 
Li Li, Zhongxu Zhang, Dongsheng Zhao, Yue Qiang, Bo Ni, Hengbin Wu, Shengchao Hu and Hanjie Lin    
The occurrence of debris flows are a significant threat to human lives and property. Estimating the debris flow scale is a crucial parameter for assessing disaster losses in such events. Currently, the commonly used method for estimating debris flow runo... ver más
Revista: Water

 
Mohammed Suleiman Mohammed Rudwan and Jean Vincent Fonou-Dombeu    
Ontology alignment has become an important process for identifying similarities and differences between ontologies, to facilitate their integration and reuse. To this end, fuzzy string-matching algorithms have been developed for strings similarity detect... ver más
Revista: Future Internet

 
Wei Zhao, Zilong Wang, Haiyang Zhang and Ting Wang    
The deformation of concrete-face rockfill dams (CFRDs) is a key parameter for the safety control of reservoir and dam systems. Rapid and accurate estimation of the deformation characteristics of CFRDs is a top priority. To realize this, we proposed a new... ver más
Revista: Water