Inicio  /  Information  /  Vol: 14 Par: 2 (2023)  /  Artículo
ARTÍCULO
TITULO

Effective Handling of Missing Values in Datasets for Classification Using Machine Learning Methods

Ashokkumar Palanivinayagam and Robertas Dama?evicius    

Resumen

The existence of missing values reduces the amount of knowledge learned by the machine learning models in the training stage thus affecting the classification accuracy negatively. To address this challenge, we introduce the use of Support Vector Machine (SVM) regression for imputing the missing values. Additionally, we propose a two-level classification process to reduce the number of false classifications. Our evaluation of the proposed method was conducted using the PIMA Indian dataset for diabetes classification. We compared the performance of five different machine learning models: Naive Bayes (NB), Support Vector Machine (SVM), k-Nearest Neighbours (KNN), Random Forest (RF), and Linear Regression (LR). The results of our experiments show that the SVM classifier achieved the highest accuracy of 94.89%. The RF classifier had the highest precision (98.80%) and the SVM classifier had the highest recall (85.48%). The NB model had the highest F1-Score (95.59%). Our proposed method provides a promising solution for detecting diabetes at an early stage by addressing the issue of missing values in the dataset. Our results show that the use of SVM regression and a two-level classification process can notably improve the performance of machine learning models for diabetes classification. This work provides a valuable contribution to the field of diabetes research and highlights the importance of addressing missing values in machine learning applications.

 Artículos similares

       
 
Yin Tang, Lizhuo Zhang, Dan Huang, Sha Yang and Yingchun Kuang    
In view of the current problems of complex models and insufficient data processing in ultra-short-term prediction of photovoltaic power generation, this paper proposes a photovoltaic power ultra-short-term prediction model named HPO-KNN-SRU, based on a S... ver más
Revista: Applied Sciences

 
Maria Carmela Groccia, Rosita Guido, Domenico Conforti, Corrado Pelaia, Giuseppe Armentaro, Alfredo Francesco Toscani, Sofia Miceli, Elena Succurro, Marta Letizia Hribal and Angela Sciacqua    
Chronic heart failure (CHF) is a clinical syndrome characterised by symptoms and signs due to structural and/or functional abnormalities of the heart. CHF confers risk for cardiovascular deterioration events which cause recurrent hospitalisations and hig... ver más
Revista: Information

 
Meng Yu, Yaqiong Lv, Yuhang Wang and Xiaojing Ji    
Berth allocation is a critical concern in container terminal port logistics, involving the precise determination of where and when arriving vessels should dock along a quay. With berth space limitations and a continuous surge in container handling demand... ver más

 
Laila Bouhouch, Mostapha Zbakh and Claude Tadonki    
The development of big data has generated data-intensive tasks that are usually time-consuming, with a high demand on cloud data centers for hosting big data applications. It becomes necessary to consider both data and task management to find the optimal... ver más
Revista: Information

 
Chunyao Hou, Yilun Wei, Hongyi Zhang, Xuezhou Zhu, Dawen Tan, Yi Zhou and Yu Hu    
In response to the challenge of limited model availability for predicting the lifespan of super-high arch dams, a hybrid model named EMD-PSO-GPR (EPR) is proposed in this study. The EPR model leverages Empirical Mode Decomposition (EMD), Gaussian Process... ver más
Revista: Water