ARTÍCULO
TITULO

Effect of Missing Data Types and Imputation Methods on Supervised Classifiers: An Evaluation Study

Menna Ibrahim Gabr    
Yehia Mostafa Helmy and Doaa Saad Elzanfaly    

Resumen

Data completeness is one of the most common challenges that hinder the performance of data analytics platforms. Different studies have assessed the effect of missing values on different classification models based on a single evaluation metric, namely, accuracy. However, accuracy on its own is a misleading measure of classifier performance because it does not consider unbalanced datasets. This paper presents an experimental study that assesses the effect of incomplete datasets on the performance of five classification models. The analysis was conducted with different ratios of missing values in six datasets that vary in size, type, and balance. Moreover, for unbiased analysis, the performance of the classifiers was measured using three different metrics, namely, the Matthews correlation coefficient (MCC), the F1-score, and accuracy. The results show that the sensitivity of the supervised classifiers to missing data differs according to a set of factors. The most significant factor is the missing data pattern and ratio, followed by the imputation method, and then the type, size, and balance of the dataset. The sensitivity of the classifiers when data are missing due to the Missing Completely At Random (MCAR) pattern is less than their sensitivity when data are missing due to the Missing Not At Random (MNAR) pattern. Furthermore, using the MCC as an evaluation measure better reflects the variation in the sensitivity of the classifiers to the missing data.

 Artículos similares

       
 
Bowen Yang, Zunhao Liu, Zhi Cai, Dongze Li, Xing Su, Limin Guo and Zhiming Ding    
In order to improve the effect of path planning in emergencies, the missing position imputation and velocity restoration in vehicle trajectory provide data support for emergency path planning and analysis. At present, there are many methods to fill in th... ver más

 
Benjamin Agbo, Hussain Al-Aqrabi, Richard Hill and Tariq Alsboui    
The Internet of Things (IoT) has had a tremendous impact on the evolution and adoption of information and communication technology. In the modern world, data are generated by individuals and collected automatically by physical objects that are fitted wit... ver más
Revista: Future Internet

 
Mirko Duradoni, Stefania Collodi, Serena Coppolino Perfumi and Andrea Guazzini    
The stranger on the Internet effect has been studied in relation to self-disclosure. Nonetheless, quantitative evidence about how people mentally represent and perceive strangers online is still missing. Given the dynamic development of web technologies,... ver más
Revista: Future Internet

 
Rebecca Schiel, Bruce M. Wilson and Malcolm Langford    
Ten years after the United Nation?s recognition of the human right to water and sanitation (HRtWS), little is understood about how these right impacts access to sanitation. There is limited identification of the mechanisms responsible for improvements in... ver más
Revista: Water

 
Santiago Cabrera, Marie Anne Eurie Forio, Koen Lock, Marte Vandenbroucke, Tania Oña, Miguel Gualoto, Peter L. M. Goethals and Christine Van der heyden    
Adequate environmental management in tropical aquatic ecosystems is imperative. Given the lack of knowledge about functional diversity and bioassessment programs, management is missing the needed evidence on pollution and its effect on biodiversity and f... ver más
Revista: Water