Inicio  /  Applied Sciences  /  Vol: 13 Par: 18 (2023)  /  Artículo
ARTÍCULO
TITULO

Improving Software Defect Prediction in Noisy Imbalanced Datasets

Haoxiang Shi    
Jun Ai    
Jingyu Liu and Jiaxi Xu    

Resumen

Software defect prediction is a popular method for optimizing software testing and improving software quality and reliability. However, software defect datasets usually have quality problems, such as class imbalance and data noise. Oversampling by generating the minority class samples is one of the most well-known methods to improving the quality of datasets; however, it often introduces overfitting noise to datasets. To better improve the quality of these datasets, this paper proposes a method called US-PONR, which uses undersampling to remove duplicate samples from version iterations and then uses oversampling through propensity score matching to reduce class imbalance and noise samples in datasets. The effectiveness of this method was validated in a software prediction experiment that involved 24 versions of software data in 11 projects from PROMISE in noisy environments that varied from 0% to 30% noise level. The experiments showed a significant improvement in the quality of datasets pre-processed by US-PONR in noisy imbalanced datasets, especially the noisiest ones, compared with 12 other advanced dataset processing methods. The experiments also demonstrated that the US-PONR method can effectively identify the label noise samples and remove them.

 Artículos similares

       
 
Bikram Kesharee Patra, Rocio L. Segura and Ashutosh Bagchi    
This study addresses the vital issue of the variability associated with modeling decisions in dam seismic analysis. Traditionally, structural modeling and simulations employ a progressive approach, where more complex models are gradually incorporated. Fo... ver más
Revista: Infrastructures

 
Xianshan Liu, Xiaolei Luo, Shaowei Liu, Pugang Zhang, Man Li and Yuhua Pan    
The study of the seepage and heat transfer law of three-dimensional rough fractures is of great significance in improving the heat extraction efficiency of underground thermal reservoirs. However, the phase transition effects of fluids during the thermal... ver más
Revista: Water

 
Maram Fahaad Almufareh and Mamoona Humayun    
Security and performance (SAP) are two critical NFRs that affect the successful completion of software projects. Organizations need to follow the practices that are vital to SAP verification. These practices must be incorporated into the software develop... ver más
Revista: Applied Sciences

 
Bogdan Benea and Adrian Soica    
The need for continuous research to refine the models used in forensic accident reconstruction appears with the development of new car models that satisfy consumer complaints. This paper analyzed a sub-sequence of car and pedestrian accidents from the pe... ver más
Revista: Applied Sciences

 
Biqing Ye, Guixin Yu, Yidong Zhang and Gang Li    
Aerostatic bearings are considered crucial components that can improve the measurement accuracy of ground simulation tests of space equipment. A structural optimization design method is proposed to enhance the static performance of aerostatic bearings. A... ver más
Revista: Applied Sciences