ARTÍCULO
TITULO

A BI-TECHNICAL ANALYSIS FOR ARABIC STOP-WORDS DETECTION

Driss Namly    
Karim Bouzoubaa    
Abdellah Yousfi    

Resumen

Stop words are defined as words that frequently appear in texts without carrying any significant information. For the Arabic language, existing works suffer from two main drawbacks (i) the use of only proprietary corpus and (ii) the reliance of only the frequency metric. Our approach for automatic Arabic stop-words detection uses a new metric based on a supervised machine learning process and a vector space representation that can be applied to any corpus, taking into account both domain-independent and domain-dependent stop-words. Conducted experiments to evaluate the proposed approach show a significant improvement reaching 91.85% for the detection rate using the F-measure metric.

 Artículos similares