ARTÍCULO
TITULO

Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15

Sikha Bagui    
Mary Walauskis    
Robert DeRush    
Huyen Praviset and Shaunda Boucugnani    

Resumen

This paper looks at the impact of changing Spark?s configuration parameters on machine learning algorithms using a large dataset?the UNSW-NB15 dataset. The environmental conditions that will optimize the classification process are studied. To build smart intrusion detection systems, a deep understanding of the environmental parameters is necessary. Specifically, the focus is on the following environmental parameters: the executor memory, number of executors, number of cores per executor, execution time, as well as the impact on statistical measures. Hence, the objective was to optimize resource usage and minimize processing time for Decision Tree classification, using Spark. This shows whether additional resources will increase performance, lower processing time, and optimize computing resources. The UNSW-NB15 dataset, being a large dataset, provides enough data and complexity to see the changes in computing resource configurations in Spark. Principal Component Analysis was used for preprocessing the dataset. Results indicated that a lack of executors and cores result in wasted resources and long processing time. Excessive resource allocation did not improve processing time. Environmental tuning has a noticeable impact.

 Artículos similares

       
 
Shuo Gao, Xirui Kang, Yaping Li, Jinpeng Yu, Hui Wang, Hong Pan, Quangang Yang, Zhongchen Yang, Yajie Sun, Yuping Zhuge and Yanhong Lou    
Cadmium (Cd) water pollution threatens environmental systems and human health. Adsorption is the preferred method for purifying water bodies polluted by Cd, and the development of effective adsorption materials is critical. The performance of original ph... ver más
Revista: Water

 
Minh-Hoang Nguyen, Tam-Tri Le and Quan-Hoang Vuong    
Modern society faces major environmental problems, but there are many difficulties in studying the nature?human relationship from an integral psychosocial perspective. We propose the ecomind sponge conceptual framework, based on the mindsponge theory of ... ver más
Revista: Urban Science

 
Owen Tamin, Ervin Gubin Moung, Jamal Ahmad Dargham, Farashazillah Yahya, Ali Farzamnia, Florence Sia, Nur Faraha Mohd Naim and Lorita Angeline    
Plastic waste is a growing environmental concern that poses a significant threat to onshore ecosystems, human health, and wildlife. The accumulation of plastic waste in oceans has reached a staggering estimate of over eight million tons annually, leading... ver más

 
Caitlin Walls, Almy Ruzni Keumala Putri and Gesa Beck    
Material Flow Cost Accounting (MFCA) is an environmental management accounting method that allocates costs to material and energy flows through a process, thereby enabling a simultaneous reduction in environmental impacts alongside an improvement in busi... ver más

 
Xiazhong Zheng, Yu Wang, Yun Chen, Qin Zeng and Lianghai Jin    
Improving the hazard identification ability of workers is an important way to reduce safety accidents at construction sites. Although previous studies have succeeded in improving hazard identification performance, an important gap is that they consider o... ver más
Revista: Buildings