Portada: Infraestructura para la Logística Sustentable 2050
DESTACADO | CPI Propone - Resumen Ejecutivo

Infraestructura para el desarrollo que queremos 2026-2030

Elaborado por el Consejo de Políticas de Infraestructura (CPI), este documento constituye una hoja de ruta estratégica para orientar la inversión y la gestión de infraestructura en Chile. Presenta propuestas organizadas en siete ejes estratégicos, sin centrarse en proyectos específicos, sino en influir en las decisiones de política pública para promover una infraestructura que conecte territorios, genere oportunidades y eleve la calidad de vida de la población.
Redirigiendo al acceso original de articulo en 23 segundos...
Inicio  /  AI  /  Vol: 6 Par: 1 (2025)
ARTÍCULO
TITULO

Feature Selection in Cancer Classification: Utilizing Explainable Artificial Intelligence to Uncover Influential Genes in Machine Learning Models

Matheus Dalmolin    
Karolayne S. Azevedo    
Luísa C. de Souza    
Caroline B. de Farias    
Martina Lichtenfels and Marcelo A. C. Fernandes    

Resumen

This study investigates the use of machine learning (ML) models combined with explainable artificial intelligence (XAI) techniques to identify the most influential genes in the classification of five recurrent cancer types in women: breast cancer (BRCA), lung adenocarcinoma (LUAD), thyroid cancer (THCA), ovarian cancer (OV), and colon adenocarcinoma (COAD). Gene expression data from RNA-seq, extracted from The Cancer Genome Atlas (TCGA), were used to train ML models, including decision trees (DTs), random forest (RF), and XGBoost (XGB), which achieved accuracies of 98.69%, 99.82%, and 99.37%, respectively. However, the challenges in this analysis included the high dimensionality of the dataset and the lack of transparency in the ML models. To mitigate these challenges, the SHAP (Shapley Additive Explanations) method was applied to generate a list of features, aiming to understand which characteristics influenced the models? decision-making processes and, consequently, the prediction results for the five tumor types. The SHAP analysis identified 119, 80, and 10 genes for the RF, XGB, and DT models, respectively, totaling 209 genes, resulting in 172 unique genes. The new list, representing 0.8% of the original input features, is coherent and fully explainable, increasing confidence in the applied models. Additionally, the results suggest that the SHAP method can be effectively used as a feature selector in gene expression data. This approach not only enhances model transparency but also maintains high classification performance, highlighting its potential in identifying biologically relevant features that may serve as biomarkers for cancer diagnostics and treatment planning.

Artículos similares

Hemos preparados una selección de otros artículos que pudieran ser de tu interés
Mohammad Subhi Al-Batah,Belal Mohammad Zaqaibeh,Saleh Ali Alomari,Mowafaq Salem Alzboon     Pág. pp. 62 - 73
Gene microarray classification problems are considered a challenge task since the datasets contain few number of samples with high number of genes (features). The genes subset selection in microarray data play an important role for minimizing the computa... ver más
Werner Mostert, Katherine M. Malan and Andries P. Engelbrecht    
This study presents a novel performance metric for feature selection algorithms that is unbiased and can be used for comparative analysis across feature selection problems. The baseline fitness improvement (BFI) measure quantifies the potential value gai... ver más
Revista: Algorithms
Jinghui Feng, Haopeng Kuang and Lihua Zhang    
Feature selection can efficiently improve classification accuracy and reduce the dimension of datasets. However, feature selection is a challenging and complex task that requires a high-performance optimization algorithm. In this paper, we propose an enh... ver más
Revista: Future Internet
Nikita Pilnenskiy and Ivan Smetannikov    
With the current trend of rapidly growing popularity of the Python programming language for machine learning applications, the gap between machine learning engineer needs and existing Python tools increases. Especially, it is noticeable for more classica... ver más
Revista: Future Internet
Reehan Ali Shah, Yuntao Qian, Dileep Kumar, Munwar Ali and Muhammad Bux Alvi    
Intrusion detection system (IDS) is a well-known and effective component of network security that provides transactions upon the network systems with security and safety. Most of earlier research has addressed difficulties such as overfitting, feature re... ver más
Revista: Future Internet