ARTÍCULO
TITULO

Two-Step Cluster based Feature Discretization of Naive Bayes for Outlier Detection in Intrinsic Plagiarism Detection

Adi Wijaya    
Romi Satria Wahono    

Resumen

Intrinsic plagiarism detection is the task of analyzing a document with respect to undeclared changes in writing style which treated as outliers. Naive Bayes is often used to outlier detection. However, Naive Bayes has assumption that the values of continuous feature are normally distributed where this condition is strongly violated that caused low classification performance. Discretization of continuous feature can improve the performance of Naïve Bayes. In this study, feature discretization based on Two-Step Cluster for Naïve Bayes has been proposed. The proposed method using tf-idf and query language model as feature creator and False Positive/False Negative (FP/FN) threshold which aims to improve the accuracy and evaluated using PAN PC 2009 dataset. The result indicated that the proposed method with discrete feature outperform the result from continuous feature for all evaluation, such as recall, precision, f-measure and accuracy. The using of FP/FN threshold affects the result as well since it can decrease FP and FN; thus, increase all evaluation.

 Artículos similares

       
 
Yu Hao, Nan Zou and Guolong Liang    
Capon beamforming is often applied in passive sonar to improve the detectability of weak underwater targets. However, we often have no accurate prior information of the direction-of-arrival (DOA) of the target in the practical applications of passive son... ver más

 
Amir Nafi and Jonathan Brans    
This paper deals with the development of a decision-aiding model for predicting, in an ex-ante way, the effects of a mix of actions on an asset and on its operation. The objective is then to define a compromised policy between costs and performance impro... ver más
Revista: Water