Inicio  /  Future Internet  /  Vol: 15 Par: 3 (2023)  /  Artículo
ARTÍCULO
TITULO

Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP)

Yibrah Gebreyesus    
Damian Dalton    
Sebastian Nixon    
Davide De Chiara and Marta Chinnici    

Resumen

The need for artificial intelligence (AI) and machine learning (ML) models to optimize data center (DC) operations increases as the volume of operations management data upsurges tremendously. These strategies can assist operators in better understanding their DC operations and help them make informed decisions upfront to maintain service reliability and availability. The strategies include developing models that optimize energy efficiency, identifying inefficient resource utilization and scheduling policies, and predicting outages. In addition to model hyperparameter tuning, feature subset selection (FSS) is critical for identifying relevant features for effectively modeling DC operations to provide insight into the data, optimize model performance, and reduce computational expenses. Hence, this paper introduces the Shapley Additive exPlanation (SHAP) values method, a class of additive feature attribution values for identifying relevant features that is rarely discussed in the literature. We compared its effectiveness with several commonly used, importance-based feature selection methods. The methods were tested on real DC operations data streams obtained from the ENEA CRESCO6 cluster with 20,832 cores. To demonstrate the effectiveness of SHAP compared to other methods, we selected the top ten most important features from each method, retrained the predictive models, and evaluated their performance using the MAE, RMSE, and MPAE evaluation criteria. The results presented in this paper demonstrate that the predictive models trained using features selected with the SHAP-assisted method performed well, with a lower error and a reasonable execution time compared to other methods.

 Artículos similares

       
 
Zhenzhen Di, Miao Chang, Peikun Guo, Yang Li and Yin Chang    
Most worldwide industrial wastewater, including in China, is still directly discharged to aquatic environments without adequate treatment. Because of a lack of data and few methods, the relationships between pollutants discharged in wastewater and those ... ver más
Revista: Water

 
Ognjen Radovic,Srdan Marinkovic,Jelena Radojicic    
Credit scoring attracts special attention of financial institutions. In recent years, deep learning methods have been particularly interesting. In this paper, we compare the performance of ensemble deep learning methods based on decision trees with the b... ver más

 
Pablo de Llano, Carlos Piñeiro, Manuel Rodríguez     Pág. pp. 163 - 198
This paper offers a comparative analysis of the effectiveness of eight popular forecasting methods: univariate, linear, discriminate and logit regression; recursive partitioning, rough sets, artificial neural networks, and DEA. Our goals are: clarify the... ver más

 
Hugo López-Fernández     Pág. 22 - 25
Mass spectrometry using matrix assisted laser desorption ionization coupled to time of flight analyzers (MALDI-TOF MS) has become popular during the last decade due to its high speed, sensitivity and robustness for detecting proteins and peptides. This a... ver más

 
Rejath Jose, Faiz Syed, Anvin Thomas and Milan Toma    
The advancement of machine learning in healthcare offers significant potential for enhancing disease prediction and management. This study harnesses the PyCaret library?a Python-based machine learning toolkit?to construct and refine predictive models for... ver más
Revista: Applied Sciences