Inicio  /  Algorithms  /  Vol: 16 Par: 5 (2023)  /  Artículo
ARTÍCULO
TITULO

Subgroup Discovery in Machine Learning Problems with Formal Concepts Analysis and Test Theory Algorithms

Igor Masich    
Natalya Rezova    
Guzel Shkaberina    
Sergei Mironov    
Mariya Bartosh and Lev Kazakovtsev    

Resumen

A number of real-world problems of automatic grouping of objects or clustering require a reasonable solution and the possibility of interpreting the result. More specific is the problem of identifying homogeneous subgroups of objects. The number of groups in such a dataset is not specified, and it is required to justify and describe the proposed grouping model. As a tool for interpretable machine learning, we consider formal concept analysis (FCA). To reduce the problem with real attributes to a problem that allows the use of FCA, we use the search for the optimal number and location of cut points and the optimization of the support set of attributes. The approach to identifying homogeneous subgroups was tested on tasks for which interpretability is important: the problem of clustering industrial products according to primary tests (for example, transistors, diodes, and microcircuits) as well as gene expression data (collected to solve the problem of predicting cancerous tumors). For the data under consideration, logical concepts are identified, formed in the form of a lattice of formal concepts. Revealed concepts are evaluated according to indicators of informativeness and can be considered as homogeneous subgroups of elements and their indicative descriptions. The proposed approach makes it possible to single out homogeneous subgroups of elements and provides a description of their characteristics, which can be considered as tougher norms that the elements of the subgroup satisfy. A comparison is made with the COBWEB algorithm designed for conceptual clustering of objects. This algorithm is aimed at discovering probabilistic concepts. The resulting lattices of logical concepts and probabilistic concepts for the considered datasets are simple and easy to interpret.

 Artículos similares

       
 
Chen Xu, Yang Wang, Zhiqi Niu, Sheng Luo and Fenghuai Du    
In this paper, by accounting for the angle constraint (AC) and autopilot lag compensation (ALC), a novel fixed-time convergent guidance law is developed based on a fixed-time state observer and bi-limit homogeneous technique. The newly proposed guidance ... ver más
Revista: Aerospace

 
Petro Pukach, Roman Kvit, Tetyana Salo and Myroslava Vovk    
A scheme for solving the problem of determining the probability of failure and probabilistic (statistical) characteristics of the failure loading magnitude of a composite material plate is considered. The plate structure is a flat homogeneous matrix with... ver más

 
José Manuel Porras, Juan Alfonso Lara, Cristóbal Romero and Sebastián Ventura    
Predicting student dropout is a crucial task in online education. Traditionally, each educational entity (institution, university, faculty, department, etc.) creates and uses its own prediction model starting from its own data. However, that approach is ... ver más
Revista: Algorithms

 
Claudia Canali, Caterina Gazzotti, Riccardo Lancellotti and Felice Schena    
In the last few years, fog computing has been recognized as a promising approach to support modern IoT applications based on microservices. The main characteristic of this application involve the presence of geographically distributed sensors or mobile e... ver más
Revista: Algorithms

 
Alessandra Martines, Giulia Furfaro, Michele Solca, Maurizio Muzzi, Andrea Di Giulio and Sergio Rossi    
Microplastic pollution constitutes a serious environmental problem that requires more effective scientific research to describe its potential impacts on marine fauna. The interaction between microplastics and marine biota can have significant negative ef... ver más
Revista: Water