ARTÍCULO
TITULO

Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion

Umesh Kokate    
Arvind Deshpande    
Parikshit Mahalle and Pramod Patil    

Resumen

Data growth in today?s world is exponential, many applications generate huge amount of data streams at very high speed such as smart grids, sensor networks, video surveillance, financial systems, medical science data, web click streams, network data, etc. In the case of traditional data mining, the data set is generally static in nature and available many times for processing and analysis. However, data stream mining has to satisfy constraints related to real-time response, bounded and limited memory, single-pass, and concept-drift detection. The main problem is identifying the hidden pattern and knowledge for understanding the context for identifying trends from continuous data streams. In this paper, various data stream methods and algorithms are reviewed and evaluated on standard synthetic data streams and real-life data streams. Density-micro clustering and density-grid-based clustering algorithms are discussed and comparative analysis in terms of various internal and external clustering evaluation methods is performed. It was observed that a single algorithm cannot satisfy all the performance measures. The performance of these data stream clustering algorithms is domain-specific and requires many parameters for density and noise thresholds.

 Artículos similares

       
 
Benjamin W. Tobin, Benjamin V. Miller, Matthew L. Niemiller and Andrea M. Erhardt    
Karst aquifers are unique among groundwater systems because of variable permeability and flow-path organization changes resulting from dissolution processes. Over time, changes in flow-path connectivity complicate interpretations of conduit network evolu... ver más
Revista: Hydrology

 
Beata Baziak, Marek Bodziony and Robert Szczepanek    
Machine learning models facilitate the search for non-linear relationships when modeling hydrological processes, but they are equally effective for automation at the data preparation stage. The tasks for which automation was analyzed consisted of estimat... ver más
Revista: Hydrology

 
Donghae Baek, Il Won Seo, Jun Song Kim, Sung Hyun Jung and Yuyoung Choi    
The dispersion coefficients are crucial in understanding the spreading of pollutant clouds in river flows, particularly in the context of the depth-averaged two-dimensional (2D) advection?dispersion equation (ADE). Traditionally, the 2D stream-tube routi... ver más
Revista: Water

 
Suiji Wang    
An anastomosing river is a stable multiple-channel system separated by inter-channel wetlands, and there are serious difficulties in observing the hydrodynamics of such river patterns in situ. Therefore, there are few reports on the hydrodynamic data of ... ver más
Revista: Water

 
Maryam Badar and Marco Fisichella    
Fairness-aware mining of data streams is a challenging concern in the contemporary domain of machine learning. Many stream learning algorithms are used to replace humans in critical decision-making processes, e.g., hiring staff, assessing credit risk, et... ver más