Inicio  /  Applied Sciences  /  Vol: 11 Par: 6 (2021)  /  Artículo
ARTÍCULO
TITULO

EvoSplit: An Evolutionary Approach to Split a Multi-Label Data Set into Disjoint Subsets

Francisco Florez-Revuelta    

Resumen

This paper presents a new evolutionary approach, EvoSplit, for the distribution of multi-label data sets into disjoint subsets for supervised machine learning. Currently, data set providers either divide a data set randomly or using iterative stratification, a method that aims to maintain the label (or label pair) distribution of the original data set into the different subsets. Following the same aim, this paper first introduces a single-objective evolutionary approach that tries to obtain a split that maximizes the similarity between those distributions independently. Second, a new multi-objective evolutionary algorithm is presented to maximize the similarity considering simultaneously both distributions (labels and label pairs). Both approaches are validated using well-known multi-label data sets as well as large image data sets currently used in computer vision and machine learning applications. EvoSplit improves the splitting of a data set in comparison to the iterative stratification following different measures: Label Distribution, Label Pair Distribution, Examples Distribution, folds and fold-label pairs with zero positive examples.

 Artículos similares

       
 
Andry Sedelnikov, Evgenii Kurkin, Jose Gabriel Quijada-Pioquinto, Oleg Lukyanov, Dmitrii Nazarov, Vladislava Chertykovtseva, Ekaterina Kurkina and Van Hung Hoang    
This paper describes the development of a methodology for air propeller optimization using Bezier curves to describe blade geometry. The proposed approach allows for more flexibility in setting the propeller shape, for example, using a variable airfoil o... ver más
Revista: Computation

 
António Pedro Branco, Cátia Vaz and Alexandre P. Francisco    
There are several tools available to infer phylogenetic trees, which depict the evolutionary relationships among biological entities such as viral and bacterial strains in infectious outbreaks or cancerous cells in tumor progression trees. These tools re... ver más
Revista: Algorithms

 
Juan Ma, Qiang Yang, Mingzhi Zhang, Yao Chen, Wenyi Zhao, Chengyu Ouyang and Dongping Ming    
Accurately predicting landslide deformation based on monitoring data is key to successful early warning of landslide disasters. Landslide displacement?time curves offer an intuitive reflection of the landslide motion process and deformation predictions o... ver más
Revista: Water

 
Linghui Hu, Na Yao, Chengxin Wang, Liting Yang, Gulden Serekbol, Bin Huo, Xuelian Qiu, Fangze Zi, Yong Song and Shengao Chen    
To study the morphological differences between and the evolutionary mechanisms driving the differentiation of geographically distinct populations of Gymnodiptychus dybowskii, 158 fish were collected from the Turks River and the Manas River in Xinjiang fr... ver más
Revista: Water

 
Christian Haubelt, Luise Müller, Kai Neubauer, Torsten Schaub and Philipp Wanko    
We address the problem of evolutionary system design (ESD) by means of answer set programming modulo difference constraints (AMT). The goal of this design approach is to synthesize new product variants or generations from existing products. We start by f... ver más
Revista: Algorithms