Inicio  /  Applied Sciences  /  Vol: 13 Par: 3 (2023)  /  Artículo
ARTÍCULO
TITULO

Hyperparameter Optimization of Ensemble Models for Spam Email Detection

Temidayo Oluwatosin Omotehinwa and David Opeoluwa Oyewola    

Resumen

Unsolicited emails, popularly referred to as spam, have remained one of the biggest threats to cybersecurity globally. More than half of the emails sent in 2021 were spam, resulting in huge financial losses. The tenacity and perpetual presence of the adversary, the spammer, has necessitated the need for improved efforts at filtering spam. This study, therefore, developed baseline models of random forest and extreme gradient boost (XGBoost) ensemble algorithms for the detection and classification of spam emails using the Enron1 dataset. The developed ensemble models were then optimized using the grid-search cross-validation technique to search the hyperparameter space for optimal hyperparameter values. The performance of the baseline (un-tuned) and the tuned models of both algorithms were evaluated and compared. The impact of hyperparameter tuning on both models was also examined. The findings of the experimental study revealed that the hyperparameter tuning improved the performance of both models when compared with the baseline models. The tuned RF and XGBoost models achieved an accuracy of 97.78% and 98.09%, a sensitivity of 98.44% and 98.84%, and an F1 score of 97.85% and 98.16%, respectively. The XGBoost model outperformed the random forest model. The developed XGBoost model is effective and efficient for spam email detection.

 Artículos similares

       
 
Hosang Han and Jangwon Suh    
The accurate prediction of soil contamination in abandoned mining areas is necessary to address their environmental risks. This study employed a combined model of machine learning and geostatistics to predict the spatial distribution of soil contaminatio... ver más
Revista: Applied Sciences

 
Vinh Pham, Maxim Tyan, Tuan Anh Nguyen and Jae-Woo Lee    
Multi-fidelity surrogate modeling (MFSM) methods are gaining recognition for their effectiveness in addressing simulation-based design challenges. Prior approaches have typically relied on recursive techniques, combining a limited number of high-fidelity... ver más
Revista: Aerospace

 
Varsha S. Lalapura, Veerender Reddy Bhimavarapu, J. Amudha and Hariram Selvamurugan Satheesh    
The Recurrent Neural Networks (RNNs) are an essential class of supervised learning algorithms. Complex tasks like speech recognition, machine translation, sentiment classification, weather prediction, etc., are now performed by well-trained RNNs. Local o... ver más
Revista: Algorithms

 
Ana Corceiro, Nuno Pereira, Khadijeh Alibabaei and Pedro D. Gaspar    
The global population?s rapid growth necessitates a 70% increase in agricultural production, posing challenges exacerbated by weed infestation and herbicide drawbacks. To address this, machine learning (ML) models, particularly convolutional neural netwo... ver más
Revista: Algorithms

 
Konstantinos Filippou, George Aifantis, George A. Papakostas and George E. Tsekouras    
In this paper, we built an automated machine learning (AutoML) pipeline for structure-based learning and hyperparameter optimization purposes. The pipeline consists of three main automated stages. The first carries out the collection and preprocessing of... ver más
Revista: Information