ARTÍCULO
TITULO

CDDM: Concept Drift Detection Model for Data Stream

Mashail Shaeel Althabiti    
Manal Abdullah    

Resumen

Data stream is the huge amount of data generated in various fields, including financial processes, social media activities, Internet of Things applications, and many others. Such data cannot be processed through traditional data mining algorithms due to several constraints, including limited memory, data speed, and dynamic environment. Concept Drift is known as the main constraint of data stream mining, mainly in the classification task. It refers to the change in the data stream underlining distribution over time. Thus, it results in accuracy deterioration of classification models and wrong predictions. Spam emails, consumer behavior changes, and adversary activates, are examples of Concept Drift. In this paper, a Concept Drift detection model is introduced, Concept Drift Detection Model (CDDM). It monitors the accuracy of the classification model over a sliding window, assuming the decline in accuracy indicates a drift occurrence. A modification over CDDM is a weighted version of the CDDM as W-CDDM.Both models have evaluated against two real datasets and four artificial datasets. The experimental results of abrupt drift show that CDDM, W-CDDM outperforms the other models in the dataset of 100K and 1M instances, respectively. Regarding gradual drift, the W-CDDM overtook the rest in terms of accuracy, run time, and detection delays in the dataset of 100 K instances. While in the dataset of 1M instances, CDDM has got the highest accuracy using the NB classifier. Moreover, W-CDDM achieves the highest accuracy on real datasets.

 Artículos similares

       
 
Antonio Maci, Alessandro Santorsola, Antonio Coscia and Andrea Iannacone    
Web phishing is a form of cybercrime aimed at tricking people into visiting malicious URLs to exfiltrate sensitive data. Since the structure of a malicious URL evolves over time, phishing detection mechanisms that can adapt to such variations are paramou... ver más
Revista: Computers

 
Carmine Paolino, Alessio Antolini, Francesco Zavalloni, Andrea Lico, Eleonora Franchi Scarselli, Mauro Mangia, Alex Marchioni, Fabio Pareschi, Gianluca Setti, Riccardo Rovatti, Mattia Luigi Torres, Marcella Carissimi and Marco Pasotti    
Analog In-Memory computing (AIMC) is a novel paradigm looking for solutions to prevent the unnecessary transfer of data by distributing computation within memory elements. One such operation is matrix-vector multiplication (MVM), a workhorse of many fiel... ver más

 
Hendrik Hähnel, Adem Ates, Benjamin Dedic and Ulrich Ratzinger    
Additive manufacturing (AM) of metals has the potential to provide significant benefits for the construction of future particle accelerators. The combination of low cost manufacturing of complex geometries in combination with efficiency gains from improv... ver más
Revista: Instruments

 
Lingkai Yang, Sally McClean, Mark Donnelly, Kevin Burke and Kashaf Khan    
Concept drift, which refers to changes in the underlying process structure or customer behaviour over time, is inevitable in business processes, causing challenges in ensuring that the learned model is a proper representation of the new data. Due to fact... ver más
Revista: Algorithms

 
Viacheslav Moskalenko, Vyacheslav Kharchenko, Alona Moskalenko and Sergey Petrov    
Modern trainable image recognition models are vulnerable to different types of perturbations; hence, the development of resilient intelligent algorithms for safety-critical applications remains a relevant concern to reduce the impact of perturbation on m... ver más
Revista: Algorithms