Inicio  /  Applied Sciences  /  Vol: 12 Par: 16 (2022)  /  Artículo
ARTÍCULO
TITULO

Generation of Controlled Synthetic Samples and Impact of Hyper-Tuning Parameters to Effectively Classify the Complex Structure of Overlapping Region

Zafar Mahmood    
Naveed Anwer Butt    
Ghani Ur Rehman    
Muhammad Zubair    
Muhammad Aslam    
Afzal Badshah and Syeda Fizzah Jilani    

Resumen

The classification of imbalanced and overlapping data has provided customary insight over the last decade, as most real-world applications comprise multiple classes with an imbalanced distribution of samples. Samples from different classes overlap near class boundaries, creating a complex structure for the underlying classifier. Due to the imbalanced distribution of samples, the underlying classifier favors samples from the majority class and ignores samples representing the least minority class. The imbalanced nature of the data?resulting in overlapping regions?greatly affects the learning of various machine learning classifiers, as most machine learning classifiers are designed to handle balanced datasets and perform poorly when applied to imbalanced data. To improve learning on multi-class problems, more expertise is required in both traditional classifiers and problem domain datasets. Some experimentation and knowledge of hyper-tuning the parameters and parameters of the classifier under consideration are required. Several techniques for learning from multi-class problems have been reported in the literature, such as sampling techniques, algorithm adaptation methods, transformation methods, hybrid methods, and ensemble techniques. In the current research work, we first analyzed the learning behavior of state-of-the-art ensemble and non-ensemble classifiers on imbalanced and overlapping multi-class data. After analysis, we used grid search techniques to optimize key parameters (by hyper-tuning) of ensemble and non-ensemble classifiers to determine the optimal set of parameters to enhance the learning from a multi-class imbalanced classification problem, performed on 15 public datasets. After hyper-tuning, 20% of the dataset samples are synthetically generated to add to the majority class of each respective dataset to make it more overlapped (complex structure). After the synthetic sample?s addition, the hyper-tuned ensemble and non-ensemble classifiers are tested over that complex structure. This paper also includes a brief description of tuned parameters and their effects on imbalanced data, followed by a detailed comparison of ensemble and non-ensemble classifiers with the default and tuned parameters for both original and synthetically overlapped datasets. We believe that the underlying paper is the first kind of effort in this domain, which will furnish various research aspects to with a greater focus on the parameters of the classifier in the field of learning from imbalanced data problems using machine-learning algorithms.

 Artículos similares

       
 
Younggil Kim    
Power control in an RS-coded orthogonal frequency division multiplex (OFDM) system with error-and-erasure correction decoding in Rayleigh fading channels was investigated. The power of each symbol within a codeword was controlled to reduce the codeword e... ver más
Revista: Information

 
Aleksandr Kulikov, Pavel Ilyushin and Anton Loskutov    
Current microprocessor-based relay protection and automation (RPA) devices supported by IEC 61850 provide access to a large amount of information on the protected or controlled electric power facility in real time. The issue of using such information (Bi... ver más
Revista: Information

 
Ana María Osorio-Granada, Bismarck Jigena-Antelo, Juan Vidal-Perez, Enrico Zambianchi, Edward G. Osorio-Granada, Cristina Torrecillas, Jeanette Romero-Cozar, Hermann Leon-Rincón, Karem Oviedo-Prada and Juan J. Muñoz-Perez    
High-resolution seismic analysis and bathymetry data, used in the Offshore Sinú Fold Belt (OSFB), have revealed seabed and sub-surface anomalies, which were probably caused by the presence of shallow gas within the sedimentary records. Shallow gas is wid... ver más

 
Yanan Miao, Pengfei Wang, Xin Li, Haiping Huang, Can Jin and Wei Gao    
Shale gas production is obviously higher within the Silurian Longmaxi Formation than that of the Cambrian Niutitang Formation according to the drilling test results in the northwest Hunan area. To clarify the reasons behind this variation, core samples f... ver más

 
Nadir Abbas, Zeshan Abbas and Xiaodong Liu    
The rapid expansion of the Internet and communication technologies is leading to significant changes in both society and the economy. This development is driving the evolution of smart cities, which utilize cutting-edge technologies and data analysis to ... ver más
Revista: Applied Sciences