Inicio  /  Information  /  Vol: 13 Par: 11 (2022)  /  Artículo
ARTÍCULO
TITULO

A Spark-Based Artificial Bee Colony Algorithm for Unbalanced Large Data Classification

Jamil Al-Sawwa and Mohammad Almseidin    

Resumen

With the rapid development of internet technology, the amount of collected or generated data has increased exponentially. The sheer volume, complexity, and unbalanced nature of this data pose a challenge to the scientific community to extract meaningful information from this data within a reasonable time. In this paper, we implemented a scalable design of an artificial bee colony for big data classification using Apache Spark. In addition, a new fitness function is proposed to handle unbalanced data. Two experiments were performed using the real unbalanced datasets to assess the performance and scalability of our proposed algorithm. The performance results reveal that our proposed fitness function can efficiently deal with unbalanced datasets and statistically outperforms the existing fitness function in terms of G-mean and F1" role="presentation">??1F1 F 1 -Score. In additon, the scalability results demonstrate that our proposed Spark-based design obtained outstanding speedup and scaleup results that are very close to optimal. In addition, our Spark-based design scales efficiently with increasing data size.

 Artículos similares

       
 
Xinbo Huang, Zhiwei Song, Chao Ji, Ye Zhang and Luya Yang    
Different types of surface defects will occur during the production of strip steel. To ensure production quality, it is essential to classify these defects. Our research indicates that two main problems exist in the existing strip steel surface defect cl... ver más
Revista: Algorithms

 
Shenghan Zhou, Tianhuai Wang, Linchao Yang, Zhao He and Siting Cao    
This paper aims to build a Self-supervised Fault Detection Model for UAVs combined with an Auto-Encoder. With the development of data science, it is imperative to detect UAV faults and improve their safety. Many factors affect the fault of a UAV, such as... ver más
Revista: Aerospace

 
Antonio Maci, Alessandro Santorsola, Antonio Coscia and Andrea Iannacone    
Web phishing is a form of cybercrime aimed at tricking people into visiting malicious URLs to exfiltrate sensitive data. Since the structure of a malicious URL evolves over time, phishing detection mechanisms that can adapt to such variations are paramou... ver más
Revista: Computers

 
Omar Azib Alkhudaydi, Moez Krichen and Ans D. Alghamdi    
With the increasing severity and frequency of cyberattacks, the rapid expansion of smart objects intensifies cybersecurity threats. The vast communication traffic data between Internet of Things (IoT) devices presents a considerable challenge in defendin... ver más
Revista: Information

 
Omer W. Taha and Yefa Hu    
As an essential enabling technology to realize advanced concepts such as digitization, intelligence, and service, information technology plays a critical role in shaping modern society and driving innovation across various industries and domains. The con... ver más
Revista: Applied Sciences